Python

Install
Connect
Read from Shooju
Getting a single series from one or multiple queries
Reading multiple series
Getting all the reported dates of a particular series
Working with Expression Globals
Write to Shooju
Creating series, points or fields
Removing series, points, or fields
Upload and download files
Error Handling
Direct connection with the API
Tips on Improving Performance
Python Cookbook / Advanced Examples
Getting facets
Additional Resources

Install

$ pip install shooju

Optional. To speed up server communication and serialization using the SJTS (Shooju Time Series) protocol:

$ pip install shooju-ts

The shooju-ts package works in the background. There is nothing to do other than install.

Connect

Access to shooju website by using link https://my.shooju.com/#myprofile.
Create an API Key on the My Profile page (under your name on the top-right of the Shooju website) by using Create New API Key.
Make note of the server (the url of the Shooju instance you log in to), username (it's on the top of the My Profile page)
Instantiate a Connection in Python:

from shooju import Connection
sj = Connection(server='https://trial.shooju.com', user='my username', api_key='xxx')

Read from Shooju

Getting a single series from one or multiple queries

Get a single Series from a single Queries (the query should return only 1 series, usually represented as a dict with fields, points and series_id):

>>> s = sj.get_series(r'sid=my\series\id', fields=['description'], df='MAX', dt='-4w', max_points=-1)

>>> s
{'fields': {'description': 'My series - An example'},
 'points': [Point(2020-02-03 00:00:00, 23.6), Point(2020-02-04 00:00:00, -7.0)],
 'series_id': 'sid=my\\series\\id'}

If you pass an asterisk in the list of fields to be returned (fields=['*']), every non-internal field in the series will be retrieved. This is also valid for the other methods that read series from Shooju.

The points are usually returned as a list of shooju.Point objects. You can change the format to a Pandas Series by changing the serializer:

from shooju import points_serializers as ps
sj.get_series(r'sid=my\series\id', max_points=-1, serializer=ps.pd_series) # will return pandas series object
sj.get_series(r'sid=my\series\id', max_points=-1, serializer=ps.pd_series_localized) # same as above, but datetimes will be localized to the timezone specified by the field 'timezone'

Check other serialization formats in the shooju.points_serializers module.

If you want to make multiple queries and you expect to receive a single series from each one of them, you can do that more efficiently with mget:

m = sj.mget()
m.get_series(r'sid=my\series\id', fields=['description']) # queues up the first series to retrieve
m.get_series(r'sid=other\series\id', fields=['description']) # queues up the second
first_series_response, second_series_response = m.fetch() # only one request to the server is sent

Reading multiple series

The fastest way to read many series is to scroll using a single structured :

for series in sj.scroll(r'sid:my\series\prefix', fields=['sid', 'description']):
    print('series id: {}, description: {}'.format(series['fields']['sid'], series['fields']['description']))

A few notes on the scroll() function parameters:

fields are optional
points are not retrieved by default because the max_points parameter defaults to 0
set max_points to 1 to retrieve all points
use df / dt to control the Date Input Notation of points to retrieve
use sort parameter to control sorting
max_series is handy to limit to the top N series
scroll_batch_size controls how many series are retrieved in one batch

the default of 2500 may be too high when retrieving many points per series
if getting a too_many_points error, try a lower scroll batch size of 500 or even less
if scrolling over expression series, an even lower batch size of 50 might be useful to avoid timeouts

the extra_params parameter is useful for sending other parameters to the raw API

To generate a Pandas DataFrame from a single series query, use get_df(). Note that this is a wrapper around scroll() and is less configurable. This function has a private parameter series_axis, which is used to set series position on DataFrame - rows (the default) or columns. Besides that, get_df() accepts the same points/fields related parameters as get_series() and scroll().

If series_axis=rows, series will be the rows and different fields will be the columns. If points are requested, series may span through multiple rows.

>>> df = sj.get_df('sid:users\\me', fields=['*'], max_points=2)
>>> print(df)
    series_id          unit      description    date                         points
0   users\me\unit-a    unit A    Unit A         2002-10-01 00:00:00+00:00    0.0
1   users\me\unit-a    unit A    Unit A         2002-10-01 01:00:00+00:00    23.1
3   users\me\unit-b    unit B    Unit B         2002-10-01 00:00:00+00:00    NaN
3   users\me\unit-b    unit B    Unit B         2002-10-01 01:00:00+00:00    6.4
...

If series_axis=columns, the resulting DataFrame will have Shooju series values as columns and points as rows. If specific fields are passed, the field values will name the DataFrame columns joined by the character '/'.

>>> df = sj.get_df('sid:users\\me', fields=['unit', 'description'], series_axis='columns', max_points=-1)
>>> print(df)
             unit A/Unit A    unit B/Unit B   ...    unit Z/Unit Z
2000-04-03   20.50            31.50           ...    34.20
2000-04-04   32.25            20.50           ...    36.00
2000-04-05   31.25            40.50           ...    46.50
...

Getting all the reported dates of a particular series

The get_reported_dates() method returns a List[datetime] with all the dates for which a particular Series has a different collection of points:

get_reported_dates(<series_query>, <job_id>, <processor>, <df>, <dt>, <mode>)

Working with Expression Globals

You can use Expression Globals from Python using extra_params parameter inside of get_series():

expression_globals = """
query1 = r'sid=my\series\id1'
query2 = r'sid=my\series\id2'

# any logic can be placed here, calling F functions is also possible and should be exactly the same used in Series Editor

G.r = sjs(query1) + sjs(query2) # assign here result of your expression
""".strip()

sj.get_series('=G.r', df='-3M', dt='MAX', max_points=-1, extra_params={'g_expression': expression_globals})

You can also see expression above in the Series Editor.

More info on Expression Globals at Expressions page.

Write to Shooju

Writing to Shooju is made by the Shooju. RemoteJob object returned by shooju.Connection.register_job(). You can use it as a context manager, as in the examples below. When not using it in a context manager, make sure to call job.submit() or job.finish() to submit any queued change.

If you need to make sure that a certain procedure is done every time an API call is sent to Shooju while modifying series, you can use shooju.RemoteJob.add_pre_submit_hook and shooju.RemoteJob.add_post_submit_hook.

Creating series, points or fields

Instantiate a job and write fields and points in a series returned by a query:

from shooju import Point
from datetime import datetime

with sj.register_job('my job description', batch_size=2000) as job: # Changes will be applied every 2000 series
	# Queries should return either 0 series (to create a new series) or 1 series (to update existing series)
    for query in [r'sid="users\myusername\test\a"', r'sid="users\myusername\test\b"']:
		job.write(
			query, fields={'description': 'my favorite series'}, points=[Point(datetime(2017,1,1), 12.4)]
		)

Note that it's also possible to pass pandas.Series to points argument. In that case, Index of the series must be of type DatetimeIndex ( default when creating with datetime )

from datetime import datetime
import pandas as pd

with sj.register_job('my job description') as job:
	job.write(
		query, points=pd.Series({datetime(2017,1,1): 11.1, datetime(2017,1,2): 12.2})
	)

To write reported points, use the similar write_reported() function:

write_reported(<series_query>, <reported_date>, <fields>, <points>)

Removing series, points, or fields

To remove one or multiple series, use the delete_series method:

with sj.register_job('Deleting series', batch_size=1000) as job:
    # Deleting one series
    job.delete_series(r'sid="users\myusername\test\a"')

    # Deleting multiple series
    job.delete_series(r'sid:"users\myusername"', one=False) # one=True is the default

To remove only the points and fields from an existing series, you can use the remove_others parameter of the write() function mentioned above.

A few notes on writing data to Shooju:

When writing more than one series, set the batch_size parameter to something higher than 1 to speed up writes. A good rule of thumb is 2000.
Registering jobs should be done as infrequently as possible, usually only once per script. Creating a job carried a performance penalty.
Existing jobs can also be reused to avoid creating more jobs by creating a RemoteJob(conn, job_id) object with an existing job_id ( you can get it from the job object using job.job_id )

Upload and download files

It is possible to upload files to Shooju as a whole or in parts. It is also possible to download any file Processors to Shooju if you have the file ID.

uploader = sj.create_uploader_session()

# Upload at once
file_id = uploader.upload_file(file_object, filename)

# Upload in multiple parts
file_id = uploader.init_multipart(filename)
uploader.upload_part(file_id, part_num, file_object)
uploader.complete_multipart(file_id)

# Donwload an uploaded file
file = sj.download_file(file_id)

Error Handling

Getting one series from one or multiple queries:

get_series() will return None if no series is found. If there is an issue in connecting with Shooju or if the query is invalid (including execution errors if the query is an Expressions or an XPR Series), shooju.ConnectionError or shooju.ShoojuApiError is raised.
The fetch() method of the object returned by mget() returns an iterator of dict series. If no series is found from any of the queued queries, if there is any connection issue or if any query is invalid (also including expression errors), shooju.ConnectionError or shooju.ShoojuApiError is raised.

Getting multiple queries from one query:

The scroll method returns an iterator of dict series. If no series is found, the iterator is returned empty. If at the time that scroll parses the API request to Shooju it notices any issue in the connection or if the query is invalid (also including expression errors), shooju.ConnectionError or shooju.ShoojuApiError is raised.
The get_df method will return an empty DataFrame if no series is found, or raise an error similarly to scroll.

Deleting series:

If you call delete_series() with a query that returns multiple series and one=True (the default), shooju.ShoojuApiError is raised.

Direct connection with the API

It is possible to make direct requests to API Reference endpoints. However, we suggest to always use the previously mentioned functions whenever possible.

sj.raw.get(
    '/series', # Relevant Shooju API endpoint (after /api/1)
    params={...} # GET query parameters
)

sj.raw.post(
    '/processors/...', # Ditto
    data_json={...} # POST JSON body
)

sj.raw.delete(
    '/series/delete', # Ditto
    data_json={...} # DELETE JSON body
)

Tips on Improving Performance

install shooju-ts

this helps when reading/writing high-volume point series

use a smaller (or larger) scroll_batch_size in sj.scroll()

smaller batch size <100 helps when the series have more points or are heavier expressions
larger batch size >3000 helps when reading many series that are light (few points and no expressions)

serialize points to what you actually need; if you want minimal serialization try:

def xx(pts, *args, **kwargs):
    return pts
for s in sj.scroll(query, fields=['sid'], df=df, dt=dt, max_points=-1, serializer=xx):
    #processing assumes points is a [(milliseconds from epoch, float),...]
    pass

Python Cookbook / Advanced Examples

Getting facets

The only correct way to get unique field values is through a raw method. Scroll should not be used.

>>> query = r'sid:my\series\test'
>>> facets = 'field_a,field_b'
>>> sj.raw.get('/series', params={'query': query, 'per_page': 0, 'facets': facets})
{'series': [],
 'success': True,
 'request_id': 'xxx',
 'facets': {'field_a': {'terms': [{'term': 'term_a1', 'count': 12},
    {'term': 'term_a2', 'count': 5}],
   'other': 0,
   'total': 17,
   'missing': 0},
  'field_b': {'terms': [{'term': 'term_b1', 'count': 3},
    {'term': 'term_b2', 'count': 2}],
   'other': 0,
   'total': 5,
   'missing': 0}},
 'validate_query': {'type': 'structured'},
 'total': 10}

Additional Resources

Refer to the docstring of each function for detailed parameter explanations.

For more thorough examples look at the tests in the source.