Customizing the Data Ontology¶
This guide walks you through the process of extending the Mosaico Data Platform with custom data models. While Mosaico provides a rich default ontology for robotics (IMU, GPS, Images, etc.), specialized hardware often requires proprietary data structures.
By the end of this guide, you will be able to:
- Define strongly-typed data models using Python and Apache Arrow.
- Register these models so they are recognized by the Mosaico Ecosystem.
- Integrate them into the ingestion and retrieval pipelines.
For a more in-depth explanation:
Step 1: Define the Custom Data Model¶
In Mosaico, data models are defined by inheriting from the Serializable base class. This ensures that your model can be automatically translated into the platform's high-performance storage format.
For this example, we will create a model for EncoderTicks, found in the NVIDIA Isaac-related datasets.
import pyarrow as pa
from mosaicolabs import HeaderMixin, Serializable
class EncoderTicks(
Serializable, # Automatically registers the model via `Serializable.__init_subclass__`
HeaderMixin, # Injects standard metadata (timestamp, frame_id, seq)
):
"""
Custom model for hardware-level encoder tick readings.
"""
# --- Wire Schema Definition (Apache Arrow) ---
# This defines the high-performance binary storage format on the server.
__msco_pyarrow_struct__ = pa.struct([
pa.field("left_ticks", pa.uint32(), nullable=False),
pa.field("right_ticks", pa.uint32(), nullable=False),
pa.field("encoder_timestamp", pa.uint64(), nullable=False),
])
# --- Data Fields ---
# Names and types must strictly match the Apache Arrow schema above.
left_ticks: int
right_ticks: int
encoder_timestamp: int
Step 2: Ensure "Discovery" via Module Import¶
It is a common pitfall to define a class and expect the platform to "see" it immediately. Mosaico utilizes the Serializable.__init_subclass__ hook to perform automatic registration the moment the class is loaded into memory by the Python interpreter.
For your custom type to be available in your application (especially during ingestion or when using the ROSBridge), you must ensure the module containing the class is imported.
Best Practice: The Registry Pattern¶
Create a dedicated models.py or ontology/ package for your project and import it at your application's entry point.
# app/main.py
import my_project.ontology.encoders as encoders # <-- This triggers the registration
from mosaicolabs import MosaicoClient
def run_ingestion():
with MosaicoClient.connect(...) as client:
# Now 'EncoderTicks' is a valid ontology_type for topic creation
with client.sequence_create(name="test") as sw:
tw = sw.topic_create("ticks", ontology_type=encoders.EncoderTicks)
# ...
Step 3: Verifying Registration¶
If you are unsure whether your model has been correctly "seen" by the ecosystem, you can check the internal registry of the Serializable class.