Processors

A processor in an ETL-like concept highly optimized for getting data into Shooju. Processors extract data from external sources or from existing Shooju series, transform the data into the Shooju Series concept, and load it into Shooju. The documentation explains how processors work and lays out key assumptions. JSON-formatted objects store key things like URLs, passwords, tokens, etc. The processor code is written in Python and can be edited and run via Shooju Web. Most processors use Launchers to run at the appropriate time while others are run manually or through Uploaders.

Keep in mind that processors write series through Jobs.

Documentation

The documentation is often written by the business user or analyst in order to describe the business requirements of the processor (e.g. where to pull data from, how often it is updated at the source, what timezone it is in, etc). The processor developer often edits the documentation to add key implementation notes (e.g. parsing issues, potential dangers, etc). The documentation is most often viewed and edited online using Shooju Web.

Settings

Settings are used to separate processor parameters that may change from the core code, which shouldn’t change often. The settings are written by the business analyst or the processor developer. Examples include:

URLs
FTP credentials
Lookups (e.g. country code to country name)
Additional field data to apply to all series (e.g. unit)
RegEx for parsing data

Roles

Roles are permissions set at the processor level. Users that are a member of a Teams added to a category below will have the corresponding permissions:

Launchers: Able to start/kill/revoke jobs.
Callers: Permission to call: /processors/<processor_id>/call/<func_name> via API.
Expression Executors: Able to use processors functions in expression context.
Admins: Able to make changes to processor settings, create or edit launchers, documentation, etc.
Code Editors: Able to inspect and edit processor code.

Code

The code is the core part of the processor that actually does the work. Processors are written in Python and are generally 10 to 500 lines long, with the average around 150. The processor developer writes the code.

Launchers

Processors can be run manually, through Uploaders, or, most frequently, through launchers. Each processor can have multiple launchers. Each launcher specifies the frequency it should run using CRON syntax, as well as how it should run: as a job or a trigger. As the name suggests, jobs immediately start a job and try to import data. Triggers check if a job should be started, and based on the logic in the code, may decide to start the job or wait until the next time the trigger runs.

Jobs

All writes to Series in Shooju must be done through a job. A job is identified by a sequential numerical identifier referred to as the job ID. The higher the job id, the more recent the job. Jobs serve several purposes in Shooju:

Named Separation of Writes

Most data in Shooju comes in batches (an update from the IMF, NYSE closing prices, etc). Jobs help logically separate writes into Shooju among these batches, identify the batch with the job ID, and name it something like “NYSE Close Prices”.

Storing Job Metadata

Jobs contains metadata that helps understand what the job wrote and how: who started the job, when the job started and ended, how many Series were written by the job, how many series/points/fields were changed, added or removed as part of the job, etc.

Preventing Write Conflicts

Series’ points or fields can only be changed by a job with the same or higher job id than the previous one. This ensures that newer jobs always get preferential treatment in a conflict. Consider a case when two jobs that started seconds apart try to import the same 10 series, but the second one has a slightly more updated version of the data. Shooju guarantees that by the time both jobs are done, all 10 series will have the values set by the second job (the one with the higher job id) even if the first job wrote the series last (because of slower processing, for example).

Version Control

Shooju stores snapshots at the level of the job instead of the level of the write to facilitate comparisons across jobs. All changes are stored unless this feature is expressly turned off when registering a new job.

Uploaders

An uploader is a way for a user to launch processors that operate on one or more files. An administrator sets up an Uploader Preset by:

associating it with an processor,
giving permissions to users or teams to use it,
overwriting any processor settings for when this uploader is used to launch it,
adding descriptive language for users of the uploader.

A user then runs the uploader through Shooju Web by:

opening the uploader tool,
choosing the Uploader Preset
uploading the file(s) they want to use,
optionally doing a test run,
giving the upload job a descriptive name.

Keep in mind that uploaders launch processors that write series through Jobs. All data must be written through a job, and uploaders are not an exception.

Summary about configurable fields

field	description
description	Process description
async_mode	if true, write requests that occur during the HTTP API call will be executed asynchronously
batch_size_num	maximum number of updates/inserts to run at a time
code_saved_by	last person who updated the code
code_saved_date	last update date
disable_sara_auto_resolve	In case of TRUE, the SARA autosolve during the created IOPS will be disabled
id	processor reference key
last_job_date	last run date
last_job_num	last run number
last_trigger_date	last trigger run
loads_proc_obj	linked processors that are needed in execution
long_running_job_threshold_num	Maximum execution time before starting a long-running IOPS (LRT)
ltnj_threshold_num	Maximum non-execution time before starting an IOPS due to inactivity (LTNJ)
next_schedule_date	next run date
next_schedule_description	description of next run
next_schedule_type	It can be JOB or Trigger
no_history	if TRUE, previous versions of sid will not be saved. If FALSE, each sid version will be stored
run_code_from_linked_account	if the code has its source in linked account, here you will find the processor link
save_description	last comment saved in version control
schedules_num	Number of launchers associated with the processor (JOBs or TRIGGERs)
series_prefix	the prefix is the sids space where this processor can perform creations, alterations and deletions
status	processor status. It could be Production \| Build \| Cancel \| validation
tags	tags (concepts or keywords) associated with the processor. May be useful for sorting and further searching
updated_date	last update date
urgency_score_num	Processor urgency in case of presenting IOPS. scale 0/5