- Quick Overview
- Structured Queries
- Field-based structured queries
- Querying by sid (Series ID) and _tree fields
- Querying _obj fields
- Querying _geo fields
- Unstructured Queries
- Non-field-based queries
- Field-based unstructured queries
- SjQL Reference
Quick Overview
Queries are used to find series. Queries are written in Shooju Query Language, or SjQL for short. You always need a query to retrieve a series
The Series Explorer accepts queries in two different ways:
- Query Builder, where you can create your query interactively with clickable elements;
- SjQL interpreter, where you can write queries directly.
There are two types of queries, with different use-cases:
Structured Queries
Structured queries take the form <fieldname><comparison><fieldvalue>
.
The <comparison>
can be =
for any type of field, or <
, >
, … if the field is numerical or a date
Structured queries are generally in the form of some field is related to some value (e.g. field country is equal to value Japan, or field completion_date is greater than 2020-01-01). For example, if you are trying to find series where country
field has the Japan
value exactly, this is the type of query to use.
Field-based structured queries
Field-based queries are specific about where the value should be found. For example, the following query
country="South Korean" aspect=Weather
finds series that have Japan
as the country
and Weather
as the aspect
.
Querying by sid (Series ID) and _tree
fields
The Series ID (sid) is a tree-like structure separated by \
. Find the exact Series ID using = like just like any other field. For example, the following query
sid=Weather\Japan\Tokyo
finds the series identified with the sid Weather\Japan\Tokyo
(it will only find one series, because the sid is unique among series!)
On the other hand, the following query
sid:Weather\Japan
finds all series that have a sid that starts with Weather\Japan, including Weather\Japan\Kyoto as well as all other cities in Japan.
Note that fields ending in _tree
work the same way, so use category_tree:Power\Generation to find series that have category_tree values of Power\Generation, Power\Generation\USA, etc.
Querying _obj
fields
Fields ending in _obj cannot be found directly because they are a collection of other fields, but those sub-fields can be found directly using dot-notation, For example, the following query
source_obj.unit=MW
finds series that have MW
in the source_obj.unit
field (unit
being a field under the source_obj
field).
Querying _geo
fields
Fields ending in _geo
have GeoJSON values. This enables finding series by geographical location. This is a bit technical and used more by developers. Use the <geo field>:near:<latitude>,<longitude>,<distance>
pattern to find series that have GeoJSON coordinates within <distance>
of <latitude>,<longitude>, as you see in the following query:
location_geo:near:42,6.7,30km
Use the <geo field>:in:<latitude>,<longitude>,<latitude>,<longitude>,...
to find series that have GeoJSON coordinates within the defined bounded area. If 3 or more <latitude>,<longitude> coordinates are used, the area will be bound by these coordinates; if only 2 coordinates are passed, the area to find coordinates in is defined as a box with the first coordinate as its north-west and the second one as its south-east. For example, see the following query with with 2 points:
location_geo:in:42,6.7,31,7.6
Unstructured Queries
If any part of the query is unstructured, the entire query will be unstructured.
Unstructured queries will search default fields only, not ALL the fields.
Non-field-based queries
Non-field-based queries don't specify the field at all. Therefore, they are the most inexact queries. For example, the following query
United
finds series that have United
in any field, in the sid (Series ID), or in any field name. So series that have to do with the country United States or the company United Airlines or just happen to have a sid (Series ID) my\united\data\base\01
would be found.
Note that using quotes or \ in the query does not make the query structured. If there's no field, it's unstructured. For example, the following query
"my\united\data"
is unstructured because there's no field specified. To make it structured, use a field. The following examples would be structured:
my_tree:"my\united\data"
sid:my\united\data
my_field=my\united\data
Field-based unstructured queries
Field-based unstructured queries are specific about which field the value should be found in, but not exact. For example, the following query
country:United aspect:Weather
finds series that have United
in the country field (both United Kingdom
and United States
are found) and Weather
anywhere in the aspect
field. Wherever exactness is possible, use structured field-based queries.
In the Query Builder in Series Explorer, you can use an unstructured non-field-based query (just like Google) in the top right. For example, the following query
Japan OR Weather
finds series that contain the words Japan or Weather contained in any part of the series id, in any field name, or any field value. This is useful only for general Google-like discovery and should be avoided in favor of structured queries where speed, exactness, and consistency of the result, matters.
SjQL Reference
Syntax | Example | Notes |
set=<field_name> | set=country | Finds series where the field is set (i.e. has a value). Useful in finding problematic series (e.g. not set:country would find series where the country is missing, which might be indicative of an issue). |
AND | Japan AND Weather | Requires both terms; this is the default operation when two terms are next to each other, so is always optional. |
OR | Japan OR Weather | Checks for the presence of either term. |
NOT | Japan AND NOT Weather | Negates the term immediately following. |
( ) | ( Japan OR Weather ) AND NOT Celsius | Enforces a specific order in expressions, standard logic use of the ( ) |
" " | " Japan Weather " OR " My Weather " | Finds text phrases. Note that this is required if the value to find includes spaces or any of the following characters: ~><*(),:=@ |
* | (Ja* OR "South A"*) Weather | Finds values that start with Ja or "South A", so it would find weather in Japan, Jamaica, South America, and South Africa. Note that for quoted terms, the * immediately follows the quotes. |
= | country=Japan OR country="South Africa" | Finds series that have exactly Japan or "South Africa" as the country. Note that we had to use the "" around South Africa because it has a space. This does not work on _obj and _geo suffixed fields. For _obj fields, use the sub-field (see above). |
=( ) | country= ( Japan,"South Africa" ) | Finds the exact same as above more conveniently. Values within the parenthesis, separated by commas cannot have spaces in between. |
: | country:South | Finds South anywhere in the country field. "South Africa" and "South Korea" would be found. Using this makes the query unstructured. |
sid: or *_tree: | sid:jodi\primary or my_tree:my\category | Finds series where the series id (sid) begins with jodi\primary or where values of the my_tree field begin with my\category |
< > or =< >= | forecast_date > 2015-01-01 OR some_num <= 10 | Finds based on a value range. Most frequently used with _num and _date fields. |
( )^ | set:country OR ( set:city )^ 10 | Finds series where either the country or city are set, but boosts series where city is set by 10. The series where city is set will show up first. |
<optional_prefix>*:(<field value>) | *:(gas) or p*:(gas) | Wildcard field searching. Searches ALL fields (with <optional_prefix> ) for a particular value. Returns any series where value is present in any field, not just default. Note: Must be the last part of a query. |