- Quick Guide
- Background
- Using Expression Globals (G)
- Using Processor Functions (Expressionables)
- NaN / None / inf values
- Expressions returning a single value
- Operators order
- Using @asof Operator with XPR Series
- Expressions Cookbook / Advanced Examples
- Finding the last value in a period or other custom groupings
- Applying per-point custom logic on a Series
- Filling in hourly/daily values
- Operating on the date index in an expression
- Filter for specific dates
- Namespace Reference
Expressions are a type of query that executes additional custom logic on Shooju points before returning the data.
Instead of the data flow being Shooju Query Result→ You, the flow becomes Shooju Query Result(s) → Expression Processing → You. Just like a normal query, an Expression-based query may return one or many series.
You can save an expression as an XPR Series and always retrieve updated results without repeating the expression itself. Note that it is not possible to write points to the XPR Series, only fields. XPR Series have it's own set of fields, separate from the underlying series.
Quick Guide
- Begin a query with = to use an expression.
- Wrap queries in {{query@optional-operators}} within an expression.
The simplest expression:
={{sid=my\weather\series}}
...is equivalent to the query:
sid=my\weather\series
Now try making some changes to manipulate the points in the series returned by the query:
={{sid=my\weather\series@A:q}}
={{sid=my\weather\series}} * 2
={{sid=my\weather\series}} / {{sid=my\weather\series@shift:-1y}} - 1
={{sid=my\weather\series}} * {{sid=some\other\series}}
Background
Expressions manipulate points as they come out of Shooju. A few examples of what you can do with expressions:
- multiply points by 2
- multiply two series by each other
- period over period calculations
- any custom algorithm that operates on Shooju data
The benefits of using expressions over an analytical tool like Excel / MATLAB / etc:
- the same expression works in any analytical tool
- the calculations is performed by Shooju, lightening the load and decreasing the complexity of Excel sheets, scripts, etc
- analytical tools are often not available in charting applications like Shooju Views, Tableau, Power BI, etc
Shooju expressions are built on Python for scripting and Pandas for mathematical operations:
- refer to the Pandas Primer for general background
- everything after = is executed as a python script
- {{query@optional-operators}} returns a datetime-indexed Pandas Series or DataFrame. For finer control, use one of these two functions:
=sjs(r'query') #returns Pandas Series or raises an error if more than one series matches the query
=sjdf(r'query') #returns Pandas DataFrame indexed on the series id
Using Expression Globals (G)
More complex expressions run into a few issues:
- duplicate read calls to Shooju via {{query}} or sjs() or sjdf()
- hard-to-read because of lack of intermediate variables
- impossible to define functions
Expression Globals are blocks of Python code that are run to initialize variables and functions:
- Expression Globals are sent to Shooju together with the expressions
- the expressions and the Expression Globals use
G
to communicate - we have not made support for Expression Globals obvious in all tools yet - ask us if you have a question about using Expression Globals in your tool
An Expression Global block:
G.a = sjs(r'query') # note that {{query}} notation cannot be used in expressions
G.b = G.a * 4
def something_complex(series):
return series * 4
G.sc = something_complex
And a few associated expressions:
=G.a
=G.b * 2
=G.sc(G.a)
Using Processor Functions (Expressionables)
Shooju Processors can house functions that can be invoked in an expression:
- This is commonly used to implement complex or frequently-used algorithms.
- The Shooju team or your team can do this via processors, and the resulting functions are specific to your company.
- Get in touch with us or your team for guidance on which functions are available to you.
- To refer to a processor function in an expression, use upper-case letters for the processor, followed by dot-function as below:
=PROCESSORNAME.mult_by_two({{example\query}})
Inside the processor processorname
, the mult_by_two
function could look like this:
@sjutils.expressionable()
def mult_by_two(series_points): # The brackets return the series' points as a Pandas Series
return series * 2 # The function should always return a Pandas Series or DataFrame
# or a scalar (see "Expressions returning a single value")
As the mult_by_two
function receives only series points, it won't return fields or facets. In order to do that, you need helper functions (see "Expressions specific functions" in SJUtils) and full Shooju series:
@sjutils.expressionable()
def mult_by_two(series_points):
query = 'sid="{}"'.format(series_points.name) # Series returned by brackets have their SID in the name attribute
series = sjclient.get_series(query, fields=['*']) # sjclient is a shooju.Connection object available in every processor
assert not series_points.empty # Complex functions usually fail when input series are empty; better to check first
# Helper functions for the expression engine
sjutils.xpr_fields(series_points.name, series['fields']) # The expression will be able to return these fields
sjutils.xpr_add_query_for_facets(query) # The expression will be able to generate facets
return series_points * 2
This way, querying for an expression that uses mult_by_two
will return fields and facets. Example to retrieve fields using the Python client:
>>> s = sj.get_series(r'=PROCESSORNAME.mult_by_two("sid=example\query")', fields=['*'])
>>> s
{'series_id': "=PROCESSORNAME.mult_by_two("sid=example\\query")",
'points': [Point(2002-09-01 00:00:00, 44.35303571428572),
Point(2002-10-01 00:00:00, 38.45739583333333)],
'fields': {'city': 'Arlington'},
'xpr': {'conflicting_fields': [],
'series_ids': ['example\\query'],
'series_queries': ['sid=example\\query'],
'xpr_query': '=PROCESSORNAME.mult_by_two("sid=example\\query")'}}
>>> s['fields']
{'city': 'Arlington'}
NaN / None / inf values
- If pandas operations result in
NaN
orNone
values, these will not be returned by the API. The pandas function fillna() can be used to fill values with the specified value. - inf and inf values will be the result of division by zero and need to be explicitly handled. The below example fills these values with 0
=( {{ series1 }} / {{ series2 }} ).replace(pd.np.inf, 0).replace( [pd.np.inf, -pd.np.inf], 0 )
Expressions returning a single value
Some Pandas methods reduce a series to a single value:
={{ sid=my\weather\series }}.mean()
In this case, it is necessary to provide a df
(date from) value other than "MIN" to Shooju. Shooju will return a series with two points, both with the same value that was returned by the provided function:
- One at the given
df
; - One at the current hour in UTC, if no
dt
(date to) other than "MAX" was provided, or exactly at the provideddt
.
Notice that the given df
and dt
will still be used to filter the points in the underlying series.
Example using the Python client:
>>> sj.get_series(r'={{ sid=my\weather\series }}.mean()', df='1970-01-01', max_points=-1)['points']
[Point(1970-01-01 00:00:00, 0.5012029458447389),
Point(2021-02-05 16:48:09.244000, 0.5012029458447389)]
>>> sj.get_series(r'={{ sid=my\weather\series }}.mean()', df='1970-01-01', dt='2100-01-01', max_points=-1)['points']
[Point(1970-01-01 00:00:00, 0.5012029458447389),
Point(2100-01-01 00:00:00, 0.5012029458447389)]
Operators order
Operators inside the brackets affect the series that are retrieved:
={{ query@A:he }} # The query result is averaged at hour ends.
={{ query@A:he }} * 10 # The query result is averaged at hour ends, and that is multiplied by 10.
You can also write operators at the end of the expression. If you want the operator to be applied to the queries inside brackets, you have to use a @pre
prefix:
={{ query@A:he }}@pre@lag:1h # Same thing as if all the operators were inside the brackets.
Currently, an operator at the end of the expression with no @pre or @post prefix works as if a @pre prefix were used. This will be deprecated in the future and it is the opposite when using XPR Series, so we recommend that the prefixes are always explicitly used.
# These two are interpreted as the same expression:
={{ query@A:he }}@pre@lag:1h
={{ query@A:he }}@lag:1h
If you want an operator to be applied to the result of the expression, write it at the end and with a @post
prefix:
={{ query@A:he }}@post@lag:1h # Whatever is returned by the query is averaged at hour ends,
# and that is lagged by 1 hour.
={{ query }}@pre@A:he@post@lag:1h # Same as above.
This is relevant when building complex expressions using queries wrapped by brackets:
=PROCESSOR.complex_function( {{ query@S:M }} )@post@lag:1h # @S will be applied to the query result;
# @lag will be applied to the function output.
When brackets are not used, you can still use @post
operators. However, applying the operators to the query passed to the function depends on the implementation; contact Shooju support if you are unsure if that is supported by a particular function:
# In the expression below, @lag will be applied to the result;
# @S:M will be applied to the query only if "another_function" supports it.
=PROCESSOR.another_function( 'query_without_brackets' )@pre@S:M@post@lag:1h
@post
operators should be last thing on an expression. When you need to apply @post
operators to a part of the expression within the expression and wish to continue with the rest of the expression, the best practice ise to use sjs
for the part before post operators:
#below will not work as @post:S:he does not come last:
(({{tsdbid=PI_WH2_NET_MW_1M@localize@WA:5me}} * {{tsdbid=PI_5M_ERCOT_WOLF_HOLLOW_II_BLOCK_1_LMP@localize@WA:5me}})@post:S:he) / 2
#below will work:
=sjs(r'=({{tsdbid=PI_WH2_NET_MW_1M@localize@WA:5me}} * {{tsdbid=PI_5M_ERCOT_WOLF_HOLLOW_II_BLOCK_1_LMP@localize@WA:5me}})@post@S:he') / 2
Using @asof Operator with XPR Series
- To retrieve historical values of the fields of an XPR Series, set
max_points
to 0 to avoid retrieving points, and use@post@asof:-1d
- To retrieve historical values of the points of the underlying series, use
@pre@asof:-1d
- note that in this case if fields are requested, they will be of the current XPR Series
Expressions Cookbook / Advanced Examples
Finding the last value in a period or other custom groupings
- Shooju lacks an operator to find the last period in a given aggregation level
- Pandas has a very powerful built in .groupby() function that can be used to do ad-hoc time-period based groupings
- To find the last value in a period, try
={{series_query}}.groupby(pd.Grouper(freq="M")).last()
- this groups the shooju result by the Monthly pandas grouper, and applies the
last()
function to get the last value
Applying per-point custom logic on a Series
- G functions can integrate any custom python code
- To apply per-point custom logic, use the
for index, value in series.items()
pattern like so: - The above code replaces non-integers with null values. It the Series Editor this would look like this:
n = sjs(r'sid=xxx')
new_values = []
for index, value in n.items():
if not value.is_integer():
value = None
new_values.append(value)
G.r = pd.Series(new_values, n.index)
Filling in hourly/daily values
- To fill date gaps with a
scalar
of choice using Pandas' resample() in conjunction with asfreq(): - It is also possible to fill values with different rules. See all supported rules here.
- If you want to fill gaps with a field value, value should be retrieved using sjclient as described in Reference section below:
={{ sid=xxx }}.resample('H').asfreq(fill_value=<scalar>)
See example above in the Series Editor.
sid = r'sid=xxx'
s = sjs(sid)
field_value = sjclient.get_series(sid, fields=['<field_name>'], max_points=0).get('fields', {}).get('<field_name>')
G.r = s.resample('H').asfreq(fill_value=field_value) if field_value else s # return original series if field_value does not exist
Or try it out in the Series Editor.
Operating on the date index in an expression
- When pandas operations inside of expressions change the index (date axis), changing shooju df/dt can affect the output.
- The reason is that changing df/dt changes the input into your expression.
- When the expression has no effect on the index (most cases), this isn’t an issue.
- For a solution, try using
sjs(r'sid=xxx@df:MIN@dt:MAX')
as an input into your expression to make sure all points are used, and are not affected by df/dt.
Filter for specific dates
- Another alternative to filter operator, pandas offers a more flexible function to do so:
- For more detailed examples and parameters please refer to pandas documentation here.
={{ sid=xxx }}.between_time(start_time='hh:mm', end_time='hh:mm')
See example above in the Series Editor.
Namespace Reference
Variable | Notes |
sjs('series-query@optional-operators',
df,
dt,
max_points
) | Returns points only, for a single Shooju series, as a Pandas Series with datetime as index. The first parameter, query , is required. The rest are optional and override the expression-level parameters.
@pre operators are automatically appended to the query. |
sjdf('query@optional-operators',
max_series,
key_field,
df,
dt,
max_points,
sort
err_if_total_higher,
) | Returns points only, for 0+ (best used when 2+) Shooju series, as a Pandas DataFrame with datetime as index. The first parameter, query , is required; the rest are optional. max_series defaults to 100 (overwritten by per_page ). key_field defaults to sid , err_if_total_higher defaults to True and if set to False it does not raise an exception if there are more results than max_series (this is dangerous and should be used with care). The rest of the parameters override the expression-level parameters.
@pre operators are automatically appended to the query. |
sjclient | Reference to a Shooju Connection object. See Python. Useful to get fields, scroll through series, etc. Use with caution as this is slower and less integrated than sjs() or sjdf() . |
dt(year, month, day, hour, ...) | Returns Python datetime. Same as datetime.datetime() method. |
sjop(Series/Dataframe, '<@operator>') | Applies Shooju-style operators to Pandas Series/DataFrame. Use this only when applying operators directly to queries is not possible. |
pd | Pandas module. |
G | Variable mostly used for communication between the expression and Expression Globals. Not used in SJPlot. |
PROCESSORNAME | Shooju Processor reference. Must be permissioned to use it. |