Skip to main content

Query

Mosaico distinguishes itself from simple file stores with a powerful Query System capable of filtering data based on both high-level metadata and content values. The query engine operates through the query action, accepting structured JSON-based filter expressions that can span the entire data hierarchy.

Architecture

The query engine is designed around a three-tier filtering model that allows you to construct complex, multi-dimensional searches:

Sequence Filtering. Target recordings by structural attributes like sequence name, creation timestamp, or user-defined metadata tags. This level allows you to narrow down which recording sessions are relevant to your search.

Topic Filtering. Refine your search to specific data streams within sequences. You can filter by topic name, ontology tag (the data type), serialization format, or topic-level user metadata.

Ontology Filtering. Query the actual physical values recorded inside the sensor data without scanning terabytes of files. The engine leverages statistical indices computed during ingestion, min/max bounds stored in the metadata cache for each chunk, to rapidly include or exclude entire segments of data.

Filter Domains

Sequence Filter

The sequence filter allows you to target specific recording sessions based on their metadata:

FieldDescription
sequence.nameThe sequence identifier (supports text operations)
sequence.created_atThe creation timestamp in nanoseconds (supports timestamp operations)
sequence.user_metadata.<key>Custom user-defined metadata attached to the sequence

Topic Filter

The topic filter narrows the search to specific data streams within matching sequences:

FieldDescription
topic.nameThe topic path within the sequence (supports text operations)
topic.created_atThe topic creation timestamp in nanoseconds (supports timestamp operations)
topic.ontology_tagThe data type identifier (e.g., Lidar, Camera, IMU)
topic.serialization_formatThe binary layout format (Default, Ragged, or Image)
topic.user_metadata.<key>Custom user-defined metadata attached to the topic

Ontology Filter

The ontology filter queries the actual sensor data values. Fields are specified using dot notation: <ontology_tag>.<field_path>.

For example, to query IMU acceleration data: imu.acceleration.x, where imu is the ontology tag and acceleration.x is the field path within that data model.

Timestamp query support

If include_timestamp_range is set to true the response will also return timestamps ranges for each query.

Supported Operators

The query engine supports a rich set of comparison operators. Each operator is prefixed with $ in the JSON syntax:

OperatorDescription
$eqEqual to (supports all types)
$neqNot equal to (supports all types)
$ltLess than (numeric and timestamp only)
$gtGreater than (numeric and timestamp only)
$leqLess than or equal to (numeric and timestamp only)
$geqGreater than or equal to (numeric and timestamp only)
$betweenWithin a range [min, max] inclusive (numeric and timestamp only)
$inValue is in a set of options (supports integers and text)
$matchMatches a pattern (text only, supports SQL LIKE patterns with % wildcards)
$exField exists
$nexField does not exist

Syntax

Queries are submitted as JSON objects. Each field is mapped to an operator and value. Multiple conditions are combined with implicit AND logic.

{
"sequence": {
"name": { "$match": "test_run_%" },
"user_metadata": {
"driver": { "$eq": "Alice" }
}
},
"topic": {
"ontology_tag": { "$eq": "imu" }
},
"ontology": {
"imu.acceleration.x": { "$gt": 5.0 },
"imu.acceleration.y": { "$between": [-2.0, 2.0] },

"include_timestamp_range": true,
}
}

This query searches for:

  • Sequences with names matching test_run_% pattern
  • Where the user metadata field driver equals "Alice"
  • Containing topics with ontology tag imu
  • Where the IMU's x-axis acceleration exceeds 5.0
  • And the y-axis acceleration is between -2.0 and 2.0
tip

include_timestamp_range field is optional, if set to true the query returns the timestamp ranges

Response Structure

The query response is hierarchically grouped by sequence. For each matching sequence, it provides the list of topics that satisfied the filter criteria, along with optional timestamp ranges indicating when the ontology conditions were met.

query_response_example
{
"items": [
{
"sequence": "test_run_01",
"topics": [
{
"locator": "test_run_01/sensors/imu",
"timestamp_range": [1000000000, 2000000000]
},
{
"locator": "test_run_01/sensors/gps",
"timestamp_range": [1000000000, 2000000000]
}
]
},
{
"sequence": "test_run_02",
"topics": [
{
"locator": "test_run_02/camera/front",
"timestamp_range": [1500000000, 2500000000]
},
{
"locator": "test_run_02/lidar/point_cloud",
"timestamp_range": [1500000000, 2500000000]
}
]
}
]
}

Timestamps

It returns the time window [min, max] where the filter conditions were met for that topic, with min being the timestamp of the first matching event and max being the timestamp of the last matching event. This allows you to retrieve only the relevant data slices using the retrieval protocol.

note

The timestamp_range field is included only when ontology filters are applied and include_timestamp_range is set to true inside the ontology filter.

Performance Characteristics

The query engine is optimized for high performance by minimizing unnecessary data retrieval and I/O operations. During execution, the engine uses index-based pruning to evaluate precomputed min/max statistics and skip indices, allowing it to bypass irrelevant data chunks without reading the underlying files.

Performance is further improved by executing metadata cache queries, such as sequence and topic filters, directly within the database, which ensures sub-second response times even across thousands of sequences.

The system employs lazy evaluation to keep network payloads lightweight; instead of returning raw data immediately, queries return locators and timestamp ranges. This architecture allows client applications to fetch only the required data slices via the retrieval protocol as needed.