- Install
- Connect
- Read from Shooju
- Getting a single series from one or multiple queries
- Reading multiple series
- Getting all the reported dates of a particular series
- Working with Expression Globals
- Write to Shooju
- Creating series, points or fields
- Removing series, points, or fields
- Upload and download files
- Error Handling
- Direct connection with the API
- Tips on Improving Performance
- Python Cookbook / Advanced Examples
- Getting facets
- Additional Resources
Install
$ pip install shooju
Optional. To speed up server communication and serialization using the SJTS (Shooju Time Series) protocol:
$ pip install shooju-ts
The shooju-ts
package works in the background. There is nothing to do other than install.
Connect
- Access to shooju website by using link https://my.shooju.com/#myprofile.
- Create an API Key on the My Profile page (under your name on the top-right of the Shooju website) by using Create New API Key.
- Make note of the server (the url of the Shooju instance you log in to), username (it's on the top of the My Profile page)
- Instantiate a Connection in Python:
from shooju import Connection
sj = Connection(server='https://trial.shooju.com', user='my username', api_key='xxx')
Read from Shooju
Getting a single series from one or multiple queries
Get a single Series from a single Queries (the query should return only 1 series, usually represented as a dict
with fields
, points
and series_id
):
>>> s = sj.get_series(r'sid=my\series\id', fields=['description'], df='MAX', dt='-4w', max_points=-1)
>>> s
{'fields': {'description': 'My series - An example'},
'points': [Point(2020-02-03 00:00:00, 23.6), Point(2020-02-04 00:00:00, -7.0)],
'series_id': 'sid=my\\series\\id'}
If you pass an asterisk in the list of fields to be returned (fields=['*']
), every non-internal field in the series will be retrieved. This is also valid for the other methods that read series from Shooju.
The points
are usually returned as a list
of shooju.Point
objects. You can change the format to a Pandas Series by changing the serializer:
from shooju import points_serializers as ps
sj.get_series(r'sid=my\series\id', max_points=-1, serializer=ps.pd_series) # will return pandas series object
sj.get_series(r'sid=my\series\id', max_points=-1, serializer=ps.pd_series_localized) # same as above, but datetimes will be localized to the timezone specified by the field 'timezone'
Check other serialization formats in the shooju.points_serializers
module.
If you want to make multiple queries and you expect to receive a single series from each one of them, you can do that more efficiently with mget
:
m = sj.mget()
m.get_series(r'sid=my\series\id', fields=['description']) # queues up the first series to retrieve
m.get_series(r'sid=other\series\id', fields=['description']) # queues up the second
first_series_response, second_series_response = m.fetch() # only one request to the server is sent
Reading multiple series
The fastest way to read many series is to scroll using a single structured :
for series in sj.scroll(r'sid:my\series\prefix', fields=['sid', 'description']):
print('series id: {}, description: {}'.format(series['fields']['sid'], series['fields']['description']))
A few notes on the scroll()
function parameters:
fields
are optional- points are not retrieved by default because the
max_points
parameter defaults to 0 - set
max_points
to1
to retrieve all points - use
df
/dt
to control the Date Input Notation of points to retrieve - use
sort
parameter to control sorting max_series
is handy to limit to the top N seriesscroll_batch_size
controls how many series are retrieved in one batch- the default of 2500 may be too high when retrieving many points per series
- if getting a too_many_points error, try a lower scroll batch size of 500 or even less
- if scrolling over expression series, an even lower batch size of 50 might be useful to avoid timeouts
- the
extra_params
parameter is useful for sending other parameters to the raw API
To generate a Pandas DataFrame from a single series query, use get_df().
Note that this is a wrapper around scroll()
and is less configurable. This function has a private parameter series_axis
, which is used to set series position on DataFrame - rows
(the default) or columns
. Besides that, get_df()
accepts the same points/fields related parameters as get_series()
and scroll()
.
If series_axis=rows
, series will be the rows and different fields will be the columns. If points are requested, series may span through multiple rows.
>>> df = sj.get_df('sid:users\\me', fields=['*'], max_points=2)
>>> print(df)
series_id unit description date points
0 users\me\unit-a unit A Unit A 2002-10-01 00:00:00+00:00 0.0
1 users\me\unit-a unit A Unit A 2002-10-01 01:00:00+00:00 23.1
3 users\me\unit-b unit B Unit B 2002-10-01 00:00:00+00:00 NaN
3 users\me\unit-b unit B Unit B 2002-10-01 01:00:00+00:00 6.4
...
If series_axis=columns
, the resulting DataFrame will have Shooju series values as columns and points as rows. If specific fields are passed, the field values will name the DataFrame columns joined by the character '/'
.
>>> df = sj.get_df('sid:users\\me', fields=['unit', 'description'], series_axis='columns', max_points=-1)
>>> print(df)
unit A/Unit A unit B/Unit B ... unit Z/Unit Z
2000-04-03 20.50 31.50 ... 34.20
2000-04-04 32.25 20.50 ... 36.00
2000-04-05 31.25 40.50 ... 46.50
...
Getting all the reported dates of a particular series
The get_reported_dates()
method returns a List[datetime]
with all the dates for which a particular Series has a different collection of points:
get_reported_dates(<series_query>, <job_id>, <processor>, <df>, <dt>, <mode>)
Working with Expression Globals
You can use Expression Globals from Python using extra_params
parameter inside of get_series():
expression_globals = """
query1 = r'sid=my\series\id1'
query2 = r'sid=my\series\id2'
# any logic can be placed here, calling F functions is also possible and should be exactly the same used in Series Editor
G.r = sjs(query1) + sjs(query2) # assign here result of your expression
""".strip()
sj.get_series('=G.r', df='-3M', dt='MAX', max_points=-1, extra_params={'g_expression': expression_globals})
You can also see expression above in the Series Editor.
More info on Expression Globals at Expressions page.
Write to Shooju
Writing to Shooju is made by the Shooju. RemoteJob object returned by shooju.Connection.register_job()
. You can use it as a context manager, as in the examples below. When not using it in a context manager, make sure to call job.submit()
or job.finish()
to submit any queued change.
If you need to make sure that a certain procedure is done every time an API call is sent to Shooju while modifying series, you can use shooju.RemoteJob.add_pre_submit_hook
and shooju.RemoteJob.add_post_submit_hook
.
Creating series, points or fields
Instantiate a job and write fields and points in a series returned by a query:
from shooju import Point
from datetime import datetime
with sj.register_job('my job description', batch_size=2000) as job: # Changes will be applied every 2000 series
# Queries should return either 0 series (to create a new series) or 1 series (to update existing series)
for query in [r'sid="users\myusername\test\a"', r'sid="users\myusername\test\b"']:
job.write(
query, fields={'description': 'my favorite series'}, points=[Point(datetime(2017,1,1), 12.4)]
)
Note that it's also possible to pass pandas.Series
to points argument. In that case, Index
of the series must be of type DatetimeIndex
( default when creating with datetime
)
from datetime import datetime
import pandas as pd
with sj.register_job('my job description') as job:
job.write(
query, points=pd.Series({datetime(2017,1,1): 11.1, datetime(2017,1,2): 12.2})
)
To write reported points, use the similar write_reported()
function:
write_reported(<series_query>, <reported_date>, <fields>, <points>)
Removing series, points, or fields
To remove one or multiple series, use the delete_series
method:
with sj.register_job('Deleting series', batch_size=1000) as job:
# Deleting one series
job.delete_series(r'sid="users\myusername\test\a"')
# Deleting multiple series
job.delete_series(r'sid:"users\myusername"', one=False) # one=True is the default
To remove only the points and fields from an existing series, you can use the remove_others
parameter of the write()
function mentioned above.
A few notes on writing data to Shooju:
- When writing more than one series, set the
batch_size
parameter to something higher than 1 to speed up writes. A good rule of thumb is 2000. - Registering jobs should be done as infrequently as possible, usually only once per script. Creating a job carried a performance penalty.
- Existing jobs can also be reused to avoid creating more jobs by creating a
RemoteJob(conn, job_id)
object with an existing job_id ( you can get it from the job object usingjob.job_id
)
Upload and download files
It is possible to upload files to Shooju as a whole or in parts. It is also possible to download any file Processors to Shooju if you have the file ID.
uploader = sj.create_uploader_session()
# Upload at once
file_id = uploader.upload_file(file_object, filename)
# Upload in multiple parts
file_id = uploader.init_multipart(filename)
uploader.upload_part(file_id, part_num, file_object)
uploader.complete_multipart(file_id)
# Donwload an uploaded file
file = sj.download_file(file_id)
Error Handling
Getting one series from one or multiple queries:
get_series()
will returnNone
if no series is found. If there is an issue in connecting with Shooju or if the query is invalid (including execution errors if the query is an Expressions or an XPR Series),shooju.ConnectionError
orshooju.ShoojuApiError
is raised.- The
fetch()
method of the object returned bymget()
returns an iterator ofdict
series. If no series is found from any of the queued queries, if there is any connection issue or if any query is invalid (also including expression errors),shooju.ConnectionError
orshooju.ShoojuApiError
is raised.
Getting multiple queries from one query:
- The
scroll
method returns an iterator ofdict
series. If no series is found, the iterator is returned empty. If at the time thatscroll
parses the API request to Shooju it notices any issue in the connection or if the query is invalid (also including expression errors),shooju.ConnectionError
orshooju.ShoojuApiError
is raised. - The
get_df
method will return an empty DataFrame if no series is found, or raise an error similarly toscroll
.
Deleting series:
- If you call
delete_series()
with a query that returns multiple series andone=True
(the default),shooju.ShoojuApiError
is raised.
Direct connection with the API
It is possible to make direct requests to API Reference endpoints. However, we suggest to always use the previously mentioned functions whenever possible.
sj.raw.get(
'/series', # Relevant Shooju API endpoint (after /api/1)
params={...} # GET query parameters
)
sj.raw.post(
'/processors/...', # Ditto
data_json={...} # POST JSON body
)
sj.raw.delete(
'/series/delete', # Ditto
data_json={...} # DELETE JSON body
)
Tips on Improving Performance
- install
shooju-ts
- this helps when reading/writing high-volume point series
- use a smaller (or larger)
scroll_batch_size
insj.scroll()
- smaller batch size <100 helps when the series have more points or are heavier expressions
- larger batch size >3000 helps when reading many series that are light (few points and no expressions)
- serialize points to what you actually need; if you want minimal serialization try:
def xx(pts, *args, **kwargs):
return pts
for s in sj.scroll(query, fields=['sid'], df=df, dt=dt, max_points=-1, serializer=xx):
#processing assumes points is a [(milliseconds from epoch, float),...]
pass
Python Cookbook / Advanced Examples
Getting facets
- The only correct way to get unique field values is through a raw method. Scroll should not be used.
>>> query = r'sid:my\series\test'
>>> facets = 'field_a,field_b'
>>> sj.raw.get('/series', params={'query': query, 'per_page': 0, 'facets': facets})
{'series': [],
'success': True,
'request_id': 'xxx',
'facets': {'field_a': {'terms': [{'term': 'term_a1', 'count': 12},
{'term': 'term_a2', 'count': 5}],
'other': 0,
'total': 17,
'missing': 0},
'field_b': {'terms': [{'term': 'term_b1', 'count': 3},
{'term': 'term_b2', 'count': 2}],
'other': 0,
'total': 5,
'missing': 0}},
'validate_query': {'type': 'structured'},
'total': 10}
`
Additional Resources
Refer to the docstring of each function for detailed parameter explanations.
For more thorough examples look at the tests in the source.