Manipulation
The Manipulation Pack is a curated collection of heterogeneous open-source robotic manipulation datasets, each individually studied, analyzed, and mapped into Mosaico's unified semantic ontology. Every dataset was manually inspected to identify its internal topics and source formats, from HDF5 to Parquet, from TensorFlow Records to ROS bags.
The result is a single, ready-to-use ingestion suite where a RobotJoint topic originating from a ROS bag looks and behaves exactly like a RobotJoint topic coming from a DeepMind TFRecord. This is the core value proposition: proving that Mosaico acts as the universal standard for semantic sensor data description across deeply fragmented ecosystems.
Installation
Clone the repository, then install and run the project with Poetry:
git clone git@github.com:mosaico-labs/mosaico-alchemy.git
cd mosaico-alchemy
# Install via Poetry
poetry install
eval $(poetry env activate)
Note: Requires Python 3.11 or higher.
To test the installation, from your terminal use the following command:
mosaico-alchemy manipulation --help
To start ingesting the data, download the datasets from the related repositories (see Supported Datasets) and then run the mosaico-alchemy manipulation command followed by your dataset directories:
mosaico-alchemy manipulation --datasets /path/to/dataset
Configuration Options
The CLI supports the following flags to control the execution environment:
| Option | Default | Description |
|---|---|---|
--datasets | Required | One or more space-separated dataset roots to ingest. |
--host | localhost | The hostname of your Mosaico Server. |
--port | 6726 | The Flight port of your Mosaico Server. |
--log-level | INFO | Set verbosity (DEBUG, INFO, WARNING, ERROR). |
--write-mode | sync | Topic execution mode for file-backed data (sync or async). |
Supported Datasets
We provide built-in support for multiple open-source formats. We recommend exploring them in the following order to understand the offered capabilities:
Reassemble
4,551 contact-rich assembly and disassembly demonstrations across 17 objects, with multimodal sensing from event cameras, force-torque sensors, microphones and multi-view RGB cameras.
File (HDF5)
Topics Ingested into Mosaico
| Topic | Ontology Type | Description |
|---|---|---|
/capture_node-camera-image | CompressedImage | RGB video from the DAVIS346 event camera's integrated frame sensor, facing the workspace. |
/events | EventCamera* | Asynchronous pixel-level brightness change events [x, y, polarity] from the event camera. |
/hama1 | CompressedImage | RGB video from the first external camera observing the robot. |
/hama1_audio | AudioDataStamped* | Audio stream from the microphone co-located with the first external camera. |
/hama2 | CompressedImage | RGB video from the second external camera observing the robot. |
/hama2_audio | AudioDataStamped* | Audio stream from the microphone co-located with the second external camera. |
/hand | CompressedImage | RGB video from the hand-mounted camera observing the worktable. |
/hand_audio | AudioDataStamped* | Audio stream from the microphone co-located with the hand camera. |
/grasp_failure_label | SegmentInfo* | Hierarchical temporal segmentation of the episode into labelled high-level and low-level action segments, each with start/end timestamps and a success flag. |
/robot_state/joint_state | RobotJoint | Position, velocity, and effort for each of the 7 robot arm joints. |
/robot_state/pose | Pose | End-effector Cartesian pose [x, y, z, qx, qy, qz, qw] in world frame. |
/robot_state/velocity | Velocity | End-effector Cartesian velocity [vx, vy, vz, wx, wy, wz] in m/s and rad/s. |
/robot_state/compensated_base_force_torque | ForceTorque | 3D force and torque at the robot base, gravity-compensated. |
/robot_state/measured_force_torque | ForceTorque | Raw 3D force and torque measured at the end-effector F/T sensor. |
/robot_state/end_effector | EndEffector* | Position, velocity, and effort for each of the 2 gripper fingers. |
*The custom ontology models defined within the pack module
RT-1 (Fractal)
87,212 pick-and-place episodes across 17 objects in Google micro kitchen environments, with RGB observations, natural language instructions, 512D task embeddings and full end-effector action space.
File (TFDS)
Topics Ingested into Mosaico
| Topic | Ontology Type | Description |
|---|---|---|
step/observation/image | CompressedImage | RGB camera frame (256×320×3) from the robot's onboard camera. |
step/observation/base_pose_tool_reached | Pose | Current end-effector pose [x, y, z, qx, qy, qz, qw] in base-relative frame. |
step/observation/orientation_start | Quaternion | End-effector orientation quaternion at the start of the episode (t=0), used to normalise subsequent rotations. |
step/observation/src_rotation | Quaternion | Reference orientation quaternion for the current phase of the task. |
step/observation/natural_language_instruction | String | Free-text task instruction (e.g. "pick rxbar chocolate from bottom drawer and place on counter"). |
step/observation/natural_language_embedding | TextEmbedding* | 512-dimensional float embedding of the natural language instruction. |
step/observation/gripper_closed | Floating32 | Binary-like indicator of whether the gripper is currently closed (1) or open (0). |
step/observation/gripper_closedness_commanded | Floating32 | Previously commanded continuous gripper position [0, 1]; comparing with gripper_closed reveals execution error or physical resistance. |
step/observation/height_to_bottom | Floating32 | Altitude of the end-effector above the ground plane, in metres. |
step/observation/rotation_delta_to_go | Vector3d | Remaining rotational displacement [roll, pitch, yaw] from current orientation to target, in radians. |
step/observation/vector_to_go | Vector3d | Displacement [Δx, Δy, Δz] from the current end-effector position to the target; used as a closed-loop control signal. |
step/observation/orientation_box | Vector3dBounds* | Min/max rotational bounds (2×3) defining the allowed orientation range for an object of interest, in radians. |
step/observation/robot_orientation_positions_box | Vector3dFrame* | 3×3 matrix describing the robot body's position and orientation in 3D space. |
step/observation/workspace_bounds | WorkspaceBounds* | 3×3 matrix defining the per-axis spatial limits within which the robot is authorised to operate. |
step/action/world_vector | Vector3d | Commanded Cartesian displacement [Δx, Δy, Δz] of the end-effector in base-relative frame. |
step/action/rotation_delta | Vector3d | Commanded orientation change [roll, pitch, yaw] in base-relative frame, in radians. |
step/action/base_displacement_vector | Vector3d | Commanded robot base translation [x, y] on the horizontal plane, in metres. |
step/action/base_displacement_vertical_rotation | Floating32 | Commanded robot base yaw rotation (rotation about the vertical Z axis), in radians. |
step/action/gripper_closedness_action | Floating32 | Target gripper closure [0, 1] commanding how tightly to grasp an object. |
step/action/terminate_episode | TerminateEpisode* | 3-element signal indicating whether the model believes the task is complete, ongoing, or in error. |
step/reward | Floating32 | Scalar reward for the action taken at this step. |
step/is_first | Boolean | True for the initial step of an episode. |
step/is_last | Boolean | True for the final step of an episode. |
step/is_terminal | Boolean | True when the episode ends (by success or failure), triggering a reset of the expected value computation. |
*The custom ontology models defined within the pack module
LeRobot DROID
95,658 manipulation episodes collected at 13 research institutions, with wrist and dual exterior stereo cameras, joint and Cartesian state, end-effector position and embedded camera extrinsics.
File (Parquet)
Topics Ingested into Mosaico
| Topic | Ontology Type | Description |
|---|---|---|
/observation/images/wrist_left | CompressedImage | RGB video (180×320×3) from the wrist-mounted camera near the end-effector. |
/observation/images/exterior_1_left | CompressedImage | RGB video (180×320×3) from the first external scene camera. |
/observation/images/exterior_2_left | CompressedImage | RGB video (180×320×3) from the second external scene camera. |
/observation/state/joint_position | RobotJoint | Current positions of the 7 Franka arm joints, in radians. |
/observation/state/cartesian_position | Pose | Current end-effector Cartesian pose [x, y, z, roll, pitch, yaw] in world frame. |
/observation/state/gripper_position | EndEffector* | Current gripper opening/closing state as a scalar. |
/action/joint_position | RobotJoint | Target positions commanded to the 7 Franka arm joints. |
/action/cartesian_position | Pose | Target end-effector Cartesian pose [x, y, z, roll, pitch, yaw] commanded at this step. |
/action/cartesian_velocity | Velocity | Target end-effector Cartesian velocity [vx, vy, vz, vroll, vpitch, vyaw] commanded at this step. |
/action/gripper_position | EndEffector* | Target gripper position commanded at this step. |
/camera_extrinsics/wrist_left | Pose | Extrinsic pose [x, y, z, roll, pitch, yaw] of the wrist camera relative to the robot. |
/camera_extrinsics/exterior_1_left | Pose | Extrinsic pose [x, y, z, roll, pitch, yaw] of the first external camera. |
/camera_extrinsics/exterior_2_left | Pose | Extrinsic pose [x, y, z, roll, pitch, yaw] of the second external camera. |
/step/reward | Floating64 | Scalar reward for the action taken at this step, ranging from 0 to 1. |
/step/discount | Floating64 | Discount factor applied to future rewards during training. |
/step/task_index | Integer64 | Numeric index identifying the task type for this episode. |
/step/frame_index | Integer64 | Positional index of this step within its episode. |
/step/is_first | Boolean | True for the initial step of an episode. |
/step/is_last | Boolean | True for the final stored step of an episode. |
/step/is_terminal | Boolean | True when the episode ends by reaching a terminal state. |
*The custom ontology models defined within the pack module
Multimodal Manipulation Learning
300 ROS bag recordings of a Kuka IIWA robot with Allegro hand, combining Tekscan tactile pressure, microphone audio, torque commands and joint states across 5 material classes.
ROS .bag
Topics Ingested into Mosaico
| Topic | Ontology Type | Description |
|---|---|---|
/allegro_hand_right/joint_states | RobotJoint | Position, velocity, and effort for the 16 joints of the Allegro dexterous hand. |
/iiwa/joint_states | RobotJoint | Position, velocity, and effort for the 7 joints of the KUKA iiwa arm. |
/iiwa/eePose | Pose | End-effector Cartesian pose [x, y, z, qx, qy, qz, qw] in world frame. |
/iiwa/TorqueController/command | JointTorqueCommand* | Torque-space control command sent to the 7 iiwa joints at each control cycle, in Nm. |
/tekscan/frame | TekscanSensor* | Tactile pressure frame from the Tekscan sensor, represented as a 2944-element array interpretable as a 46×64 pressure matrix. |
/audio/audio | AudioDataStamped* | Raw mono audio chunks captured during the manipulation trial. |
/audio/audio_info | AudioInfo* | Audio stream metadata: sample rate (16 kHz), format (S16LE), and codec (MP3). |
/trialInfo | String | Trial metadata and discrete event markers: the first message contains a YAML block with motion parameters and controller configuration; subsequent messages mark events such as shake start or stop. |
*The custom ontology models defined within the pack module
A Note about the Custom Ontology Models
Some datasets include application-specific data types that do not map cleanly to Mosaico’s built-in ontology models; for example, the TerminateEpisode model is highly specific to the data format defined in the Fractal dataset.
Defining custom ontology models lets you represent these data structures precisely, with typed fields, validation rules,
and consistent semantics for your own sensors or proprietary message formats.
Mosaico provides a fast path to define and automatically register custom data models. Once registered, they behave like native models and are accepted by the platform without extra integration work.
Customizing the Pack
If your robotic data is saved in a proprietary format and isn't supported out-of-the-box, the module is fully extensible. This section explains how to extend the Manipulation Pack to support your own custom dataset formats alongside the built-in ones. By following this guide, you will:
- Select an Execution Backend: Decide whether to rely natively on ROS bag architectures or to construct a structured File descriptor pipeline.
- Define Ontologies and Adapters: Choose the Mosaico types that model your sensor streams and implement the adapters that translate raw dictionaries into them.
- Implement the Dataset Plugin: Assemble the ingestion plan that wires your iterators, ontologies, and adapters into a declarative sequence descriptor.
- Register Your Components: Expose your plugin and adapters to the CLI layer so they are discovered at runtime.
Choose the Execution Backend
Your first architectural decision is deciding how the data should be accessed. If your dataset relies on standard file and database formats such as HDF5, Parquet, JSON, or custom binary files, you will utilize the file-backed executor by generating a SequenceDescriptor.
Alternatively, if your dataset consists of native ROS bags and maps naturally to ROS topics, you should use a RosbagSequenceDescriptor. This allows the plugin to validate the required topics and immediately delegate the demanding ingestion effort directly to the ROS bridge.
Apply Ontologies and Implement Adapters
For each data stream, you define how it maps to a Mosaico ontology type. If you cannot reuse an existing SDK ontology, you can define your own custom class. We highly recommend reviewing the Ontology Customization Guide to see exactly how to write and register your own custom data models.
The Custom Adapter
With your ontologies defining the target structure, you implement a custom adapter.
from mosaicolabs import Message
from mosaicolabs.packs.manipulation.adapters.base import BaseAdapter
# Adapters are generic on the ontology type they produce.
# Specialising BaseAdapter[CustomSensorModel] binds the translate() return type
# and restricts this adapter to messages carrying CustomSensorModel payloads only.
class MyDatasetCustomAdapter(BaseAdapter[CustomSensorModel]):
# The global identifier that will be referenced by name in your `TopicDescriptor`.
adapter_id = "mydataset.custom_sensor"
# Define the fields required in the payload, to be validated
# in the `translate` function.
_REQUIRED_KEYS: tuple[str, ...] = ("timestamp","sensor_readings")
# The translator factory: an adapter receives the raw payload dictionary
# and returns an instantiated Mosaico `Message`.
@classmethod
def translate(cls, payload: dict) -> Message:
"""
Translates a raw custom dictionary into a Mosaico Message container.
"""
# Guard against malformed payloads before any field access:
# sensor_readings must be present and contain exactly 4 elements.
cls._validate_payload(
payload=payload,
constraints={"sensor_readings": {"len": 4}}
)
return Message(
# Conversion from arbitrary time structures into native
# Mosaico nanosecond formats.
timestamp_ns=int(payload["timestamp"] * 1e9),
data=CustomSensorModel(values=payload["sensor_readings"]),
)
Develop the Dataset Plugin
With adapters in place, you implement the dataset plugin class. This class satisfies a straightforward internal protocol to detect your dataset format, discover its logical sequences, and assemble the ingestion plan that wires everything together.
The Plugin Protocol
class DatasetPlugin(Protocol):
# A unique string identifier used in downstream logging and operator prompts.
dataset_id: str
# A method to verify if a given folder matches the dataset signature
# (e.g., checking for `*.h5` files) avoiding expensive full-dataset scans.
def supports(self, root: Path) -> bool: ...
# A method to discover the logical sequences contained in the root folder,
# such as individual robot episodes.
def discover_sequences(self, root: Path) -> Iterable[Path]: ...
# The core logic to create an ingestion plan for each sequence,
# returning an `IngestionDescriptor`.
def create_ingestion_plan(self, sequence_path: Path) -> IngestionDescriptor: ...
Extracting Raw Data
To keep your plugin code clean and maintainable, raw file I/O operations must be completely separated from the orchestration step. You should create dedicated iterator functions following the factory pattern: each function accepts static configuration parameters (file paths, field names) and returns a Callable[[Path], Iterable[dict]]. The runner calls that callable later with the actual sequence path to stream the raw payloads. Each payload dict must include a "timestamp" key expressed in seconds as a float.
def iter_video_frames(video_path: str, timestamps_path: str) -> Callable[[Path], Iterable[dict]]:
def _fn(sequence_path: Path) -> Iterable[dict]:
# open sequence_path and read the data at video_path / timestamps_path
yield {"timestamp": 1234.56, "image": b"..."}
return _fn
def count_video_frames(timestamps_path: str) -> Callable[[Path], int]:
def _fn(sequence_path: Path) -> int:
# count the number of frames at timestamps_path
return total_frames
return _fn
Structuring the Ingestion Plan
The plugin implements create_ingestion_plan to declare the sequence name, its metadata, and the full list of topics. Each topic references the iterators and adapters you built in the previous steps.
return SequenceDescriptor(
# The unique name of the target sequence being constructed.
sequence_name=f"{self.dataset_id}_{sequence_path.stem}",
# The custom metadata of this sequence.
sequence_metadata={
"dataset_id": self.dataset_id,
"ingestion_backend": "file",
},
topics=[
TopicDescriptor(
# The topic name that will handle this data stream.
topic_name="/camera/front",
# The Mosaico ontology type that models this data stream.
ontology_type=CompressedImage,
#The identifier of the adapter responsible for translating this specific topic.
adapter_id=f"{self.dataset_id}.video_frame",
# The runner calls this callable later with the sequence path
# to obtain the raw payload stream.
payload_iter=iter_video_frames("path/to/video", "path/to/timestamps"),
# The runner uses this to count the total messages ahead of time for progress
# reporting. It must match the number of items `payload_iter` will yield.
message_count=count_video_frames("path/to/timestamps"),
),
],
)
Register Your Components
Once your dataset plugin, functional iterators, and adapters are implemented, they must be made discoverable to the CLI layer. There are two separate registries to update: one for the dataset plugin and one for the adapter.
Registering the Dataset Plugin
from mosaico_alchemy.manipulation.datasets import DatasetRegistry
# DatasetRegistry is a singleton pre-populated with the built-in plugins.
# Registering here makes your plugin visible everywhere the registry is used.
registry = DatasetRegistry()
registry.register(MyDatasetPlugin())
Importing DatasetRegistry triggers the default registry setup automatically, so the instance returned by DatasetRegistry() already contains all built-in plugins.
Calling register on it adds your plugin to the same shared instance, making it available across both the CLI and the runner without any further wiring:
- CLI — your dataset appears alongside the built-in options in the interactive selection prompt.
- Runner — auto-detection via
registry.resolve(root)will call your plugin'ssupportsmethod against the dataset root.
Registering the Adapter
The adapter must also be registered in the AdapterRegistry. Without this step, the file executor will not be able to resolve the adapter_id declared in your TopicDescriptor and the ingestion will fail.
from mosaico_alchemy.manipulation.adapters import AdapterRegistry
# AdapterRegistry is a singleton pre-populated with the built-in adapters.
# Registering here makes your adapter visible everywhere the registry is used.
registry = AdapterRegistry()
registry.register(MyDatasetCustomAdapter)
After registering both components, your dataset plugin is ready to be executed.