Skip to main content

Manipulation

The Manipulation Pack is a curated collection of heterogeneous open-source robotic manipulation datasets, each individually studied, analyzed, and mapped into Mosaico's unified semantic ontology. Every dataset was manually inspected to identify its internal topics and source formats, from HDF5 to Parquet, from TensorFlow Records to ROS bags.

The result is a single, ready-to-use ingestion suite where a RobotJoint topic originating from a ROS bag looks and behaves exactly like a RobotJoint topic coming from a DeepMind TFRecord. This is the core value proposition: proving that Mosaico acts as the universal standard for semantic sensor data description across deeply fragmented ecosystems.

Installation

Clone the repository, then install and run the project with Poetry:

git clone git@github.com:mosaico-labs/mosaico-alchemy.git
cd mosaico-alchemy

# Install via Poetry
poetry install
eval $(poetry env activate)

Note: Requires Python 3.11 or higher.

To test the installation, from your terminal use the following command:

mosaico-alchemy manipulation --help

To start ingesting the data, download the datasets from the related repositories (see Supported Datasets) and then run the mosaico-alchemy manipulation command followed by your dataset directories:

mosaico-alchemy manipulation --datasets /path/to/dataset

Configuration Options

The CLI supports the following flags to control the execution environment:

OptionDefaultDescription
--datasetsRequiredOne or more space-separated dataset roots to ingest.
--hostlocalhostThe hostname of your Mosaico Server.
--port6726The Flight port of your Mosaico Server.
--log-levelINFOSet verbosity (DEBUG, INFO, WARNING, ERROR).
--write-modesyncTopic execution mode for file-backed data (sync or async).

Supported Datasets

We provide built-in support for multiple open-source formats. We recommend exploring them in the following order to understand the offered capabilities:

Reassemble

4,551 contact-rich assembly and disassembly demonstrations across 17 objects, with multimodal sensing from event cameras, force-torque sensors, microphones and multi-view RGB cameras.

Execution Backend

File (HDF5)

Topics Ingested into Mosaico
TopicOntology TypeDescription
/capture_node-camera-imageCompressedImageRGB video from the DAVIS346 event camera's integrated frame sensor, facing the workspace.
/eventsEventCamera*Asynchronous pixel-level brightness change events [x, y, polarity] from the event camera.
/hama1CompressedImageRGB video from the first external camera observing the robot.
/hama1_audioAudioDataStamped*Audio stream from the microphone co-located with the first external camera.
/hama2CompressedImageRGB video from the second external camera observing the robot.
/hama2_audioAudioDataStamped*Audio stream from the microphone co-located with the second external camera.
/handCompressedImageRGB video from the hand-mounted camera observing the worktable.
/hand_audioAudioDataStamped*Audio stream from the microphone co-located with the hand camera.
/grasp_failure_labelSegmentInfo*Hierarchical temporal segmentation of the episode into labelled high-level and low-level action segments, each with start/end timestamps and a success flag.
/robot_state/joint_stateRobotJointPosition, velocity, and effort for each of the 7 robot arm joints.
/robot_state/posePoseEnd-effector Cartesian pose [x, y, z, qx, qy, qz, qw] in world frame.
/robot_state/velocityVelocityEnd-effector Cartesian velocity [vx, vy, vz, wx, wy, wz] in m/s and rad/s.
/robot_state/compensated_base_force_torqueForceTorque3D force and torque at the robot base, gravity-compensated.
/robot_state/measured_force_torqueForceTorqueRaw 3D force and torque measured at the end-effector F/T sensor.
/robot_state/end_effectorEndEffector*Position, velocity, and effort for each of the 2 gripper fingers.

*The custom ontology models defined within the pack module

RT-1 (Fractal)

87,212 pick-and-place episodes across 17 objects in Google micro kitchen environments, with RGB observations, natural language instructions, 512D task embeddings and full end-effector action space.

Execution Backend

File (TFDS)

Topics Ingested into Mosaico
TopicOntology TypeDescription
step/observation/imageCompressedImageRGB camera frame (256×320×3) from the robot's onboard camera.
step/observation/base_pose_tool_reachedPoseCurrent end-effector pose [x, y, z, qx, qy, qz, qw] in base-relative frame.
step/observation/orientation_startQuaternionEnd-effector orientation quaternion at the start of the episode (t=0), used to normalise subsequent rotations.
step/observation/src_rotationQuaternionReference orientation quaternion for the current phase of the task.
step/observation/natural_language_instructionStringFree-text task instruction (e.g. "pick rxbar chocolate from bottom drawer and place on counter").
step/observation/natural_language_embeddingTextEmbedding*512-dimensional float embedding of the natural language instruction.
step/observation/gripper_closedFloating32Binary-like indicator of whether the gripper is currently closed (1) or open (0).
step/observation/gripper_closedness_commandedFloating32Previously commanded continuous gripper position [0, 1]; comparing with gripper_closed reveals execution error or physical resistance.
step/observation/height_to_bottomFloating32Altitude of the end-effector above the ground plane, in metres.
step/observation/rotation_delta_to_goVector3dRemaining rotational displacement [roll, pitch, yaw] from current orientation to target, in radians.
step/observation/vector_to_goVector3dDisplacement [Δx, Δy, Δz] from the current end-effector position to the target; used as a closed-loop control signal.
step/observation/orientation_boxVector3dBounds*Min/max rotational bounds (2×3) defining the allowed orientation range for an object of interest, in radians.
step/observation/robot_orientation_positions_boxVector3dFrame*3×3 matrix describing the robot body's position and orientation in 3D space.
step/observation/workspace_boundsWorkspaceBounds*3×3 matrix defining the per-axis spatial limits within which the robot is authorised to operate.
step/action/world_vectorVector3dCommanded Cartesian displacement [Δx, Δy, Δz] of the end-effector in base-relative frame.
step/action/rotation_deltaVector3dCommanded orientation change [roll, pitch, yaw] in base-relative frame, in radians.
step/action/base_displacement_vectorVector3dCommanded robot base translation [x, y] on the horizontal plane, in metres.
step/action/base_displacement_vertical_rotationFloating32Commanded robot base yaw rotation (rotation about the vertical Z axis), in radians.
step/action/gripper_closedness_actionFloating32Target gripper closure [0, 1] commanding how tightly to grasp an object.
step/action/terminate_episodeTerminateEpisode*3-element signal indicating whether the model believes the task is complete, ongoing, or in error.
step/rewardFloating32Scalar reward for the action taken at this step.
step/is_firstBooleanTrue for the initial step of an episode.
step/is_lastBooleanTrue for the final step of an episode.
step/is_terminalBooleanTrue when the episode ends (by success or failure), triggering a reset of the expected value computation.

*The custom ontology models defined within the pack module

LeRobot DROID

95,658 manipulation episodes collected at 13 research institutions, with wrist and dual exterior stereo cameras, joint and Cartesian state, end-effector position and embedded camera extrinsics.

Execution Backend

File (Parquet)

Topics Ingested into Mosaico
TopicOntology TypeDescription
/observation/images/wrist_leftCompressedImageRGB video (180×320×3) from the wrist-mounted camera near the end-effector.
/observation/images/exterior_1_leftCompressedImageRGB video (180×320×3) from the first external scene camera.
/observation/images/exterior_2_leftCompressedImageRGB video (180×320×3) from the second external scene camera.
/observation/state/joint_positionRobotJointCurrent positions of the 7 Franka arm joints, in radians.
/observation/state/cartesian_positionPoseCurrent end-effector Cartesian pose [x, y, z, roll, pitch, yaw] in world frame.
/observation/state/gripper_positionEndEffector*Current gripper opening/closing state as a scalar.
/action/joint_positionRobotJointTarget positions commanded to the 7 Franka arm joints.
/action/cartesian_positionPoseTarget end-effector Cartesian pose [x, y, z, roll, pitch, yaw] commanded at this step.
/action/cartesian_velocityVelocityTarget end-effector Cartesian velocity [vx, vy, vz, vroll, vpitch, vyaw] commanded at this step.
/action/gripper_positionEndEffector*Target gripper position commanded at this step.
/camera_extrinsics/wrist_leftPoseExtrinsic pose [x, y, z, roll, pitch, yaw] of the wrist camera relative to the robot.
/camera_extrinsics/exterior_1_leftPoseExtrinsic pose [x, y, z, roll, pitch, yaw] of the first external camera.
/camera_extrinsics/exterior_2_leftPoseExtrinsic pose [x, y, z, roll, pitch, yaw] of the second external camera.
/step/rewardFloating64Scalar reward for the action taken at this step, ranging from 0 to 1.
/step/discountFloating64Discount factor applied to future rewards during training.
/step/task_indexInteger64Numeric index identifying the task type for this episode.
/step/frame_indexInteger64Positional index of this step within its episode.
/step/is_firstBooleanTrue for the initial step of an episode.
/step/is_lastBooleanTrue for the final stored step of an episode.
/step/is_terminalBooleanTrue when the episode ends by reaching a terminal state.

*The custom ontology models defined within the pack module

Multimodal Manipulation Learning

300 ROS bag recordings of a Kuka IIWA robot with Allegro hand, combining Tekscan tactile pressure, microphone audio, torque commands and joint states across 5 material classes.

Execution Backend

ROS .bag

Topics Ingested into Mosaico
TopicOntology TypeDescription
/allegro_hand_right/joint_statesRobotJointPosition, velocity, and effort for the 16 joints of the Allegro dexterous hand.
/iiwa/joint_statesRobotJointPosition, velocity, and effort for the 7 joints of the KUKA iiwa arm.
/iiwa/eePosePoseEnd-effector Cartesian pose [x, y, z, qx, qy, qz, qw] in world frame.
/iiwa/TorqueController/commandJointTorqueCommand*Torque-space control command sent to the 7 iiwa joints at each control cycle, in Nm.
/tekscan/frameTekscanSensor*Tactile pressure frame from the Tekscan sensor, represented as a 2944-element array interpretable as a 46×64 pressure matrix.
/audio/audioAudioDataStamped*Raw mono audio chunks captured during the manipulation trial.
/audio/audio_infoAudioInfo*Audio stream metadata: sample rate (16 kHz), format (S16LE), and codec (MP3).
/trialInfoStringTrial metadata and discrete event markers: the first message contains a YAML block with motion parameters and controller configuration; subsequent messages mark events such as shake start or stop.

*The custom ontology models defined within the pack module

A Note about the Custom Ontology Models

Some datasets include application-specific data types that do not map cleanly to Mosaico’s built-in ontology models; for example, the TerminateEpisode model is highly specific to the data format defined in the Fractal dataset. Defining custom ontology models lets you represent these data structures precisely, with typed fields, validation rules, and consistent semantics for your own sensors or proprietary message formats.

Mosaico provides a fast path to define and automatically register custom data models. Once registered, they behave like native models and are accepted by the platform without extra integration work.

Customizing the Pack

If your robotic data is saved in a proprietary format and isn't supported out-of-the-box, the module is fully extensible. This section explains how to extend the Manipulation Pack to support your own custom dataset formats alongside the built-in ones. By following this guide, you will:

  • Select an Execution Backend: Decide whether to rely natively on ROS bag architectures or to construct a structured File descriptor pipeline.
  • Define Ontologies and Adapters: Choose the Mosaico types that model your sensor streams and implement the adapters that translate raw dictionaries into them.
  • Implement the Dataset Plugin: Assemble the ingestion plan that wires your iterators, ontologies, and adapters into a declarative sequence descriptor.
  • Register Your Components: Expose your plugin and adapters to the CLI layer so they are discovered at runtime.

Choose the Execution Backend

Your first architectural decision is deciding how the data should be accessed. If your dataset relies on standard file and database formats such as HDF5, Parquet, JSON, or custom binary files, you will utilize the file-backed executor by generating a SequenceDescriptor.

Alternatively, if your dataset consists of native ROS bags and maps naturally to ROS topics, you should use a RosbagSequenceDescriptor. This allows the plugin to validate the required topics and immediately delegate the demanding ingestion effort directly to the ROS bridge.

Apply Ontologies and Implement Adapters

For each data stream, you define how it maps to a Mosaico ontology type. If you cannot reuse an existing SDK ontology, you can define your own custom class. We highly recommend reviewing the Ontology Customization Guide to see exactly how to write and register your own custom data models.

The Custom Adapter

With your ontologies defining the target structure, you implement a custom adapter.

from mosaicolabs import Message
from mosaicolabs.packs.manipulation.adapters.base import BaseAdapter

# Adapters are generic on the ontology type they produce.
# Specialising BaseAdapter[CustomSensorModel] binds the translate() return type
# and restricts this adapter to messages carrying CustomSensorModel payloads only.
class MyDatasetCustomAdapter(BaseAdapter[CustomSensorModel]):
# The global identifier that will be referenced by name in your `TopicDescriptor`.
adapter_id = "mydataset.custom_sensor"
# Define the fields required in the payload, to be validated
# in the `translate` function.
_REQUIRED_KEYS: tuple[str, ...] = ("timestamp","sensor_readings")

# The translator factory: an adapter receives the raw payload dictionary
# and returns an instantiated Mosaico `Message`.
@classmethod
def translate(cls, payload: dict) -> Message:
"""
Translates a raw custom dictionary into a Mosaico Message container.
"""
# Guard against malformed payloads before any field access:
# sensor_readings must be present and contain exactly 4 elements.
cls._validate_payload(
payload=payload,
constraints={"sensor_readings": {"len": 4}}
)
return Message(
# Conversion from arbitrary time structures into native
# Mosaico nanosecond formats.
timestamp_ns=int(payload["timestamp"] * 1e9),
data=CustomSensorModel(values=payload["sensor_readings"]),
)

Develop the Dataset Plugin

With adapters in place, you implement the dataset plugin class. This class satisfies a straightforward internal protocol to detect your dataset format, discover its logical sequences, and assemble the ingestion plan that wires everything together.

The Plugin Protocol

class DatasetPlugin(Protocol):
# A unique string identifier used in downstream logging and operator prompts.
dataset_id: str

# A method to verify if a given folder matches the dataset signature
# (e.g., checking for `*.h5` files) avoiding expensive full-dataset scans.
def supports(self, root: Path) -> bool: ...
# A method to discover the logical sequences contained in the root folder,
# such as individual robot episodes.
def discover_sequences(self, root: Path) -> Iterable[Path]: ...
# The core logic to create an ingestion plan for each sequence,
# returning an `IngestionDescriptor`.
def create_ingestion_plan(self, sequence_path: Path) -> IngestionDescriptor: ...

Extracting Raw Data

To keep your plugin code clean and maintainable, raw file I/O operations must be completely separated from the orchestration step. You should create dedicated iterator functions following the factory pattern: each function accepts static configuration parameters (file paths, field names) and returns a Callable[[Path], Iterable[dict]]. The runner calls that callable later with the actual sequence path to stream the raw payloads. Each payload dict must include a "timestamp" key expressed in seconds as a float.

def iter_video_frames(video_path: str, timestamps_path: str) -> Callable[[Path], Iterable[dict]]:
def _fn(sequence_path: Path) -> Iterable[dict]:
# open sequence_path and read the data at video_path / timestamps_path
yield {"timestamp": 1234.56, "image": b"..."}
return _fn

def count_video_frames(timestamps_path: str) -> Callable[[Path], int]:
def _fn(sequence_path: Path) -> int:
# count the number of frames at timestamps_path
return total_frames
return _fn

Structuring the Ingestion Plan

The plugin implements create_ingestion_plan to declare the sequence name, its metadata, and the full list of topics. Each topic references the iterators and adapters you built in the previous steps.

return SequenceDescriptor(
# The unique name of the target sequence being constructed.
sequence_name=f"{self.dataset_id}_{sequence_path.stem}",
# The custom metadata of this sequence.
sequence_metadata={
"dataset_id": self.dataset_id,
"ingestion_backend": "file",
},
topics=[
TopicDescriptor(
# The topic name that will handle this data stream.
topic_name="/camera/front",
# The Mosaico ontology type that models this data stream.
ontology_type=CompressedImage,
#The identifier of the adapter responsible for translating this specific topic.
adapter_id=f"{self.dataset_id}.video_frame",
# The runner calls this callable later with the sequence path
# to obtain the raw payload stream.
payload_iter=iter_video_frames("path/to/video", "path/to/timestamps"),
# The runner uses this to count the total messages ahead of time for progress
# reporting. It must match the number of items `payload_iter` will yield.
message_count=count_video_frames("path/to/timestamps"),
),
],
)

Register Your Components

Once your dataset plugin, functional iterators, and adapters are implemented, they must be made discoverable to the CLI layer. There are two separate registries to update: one for the dataset plugin and one for the adapter.

Registering the Dataset Plugin

from mosaico_alchemy.manipulation.datasets import DatasetRegistry

# DatasetRegistry is a singleton pre-populated with the built-in plugins.
# Registering here makes your plugin visible everywhere the registry is used.
registry = DatasetRegistry()
registry.register(MyDatasetPlugin())

Importing DatasetRegistry triggers the default registry setup automatically, so the instance returned by DatasetRegistry() already contains all built-in plugins. Calling register on it adds your plugin to the same shared instance, making it available across both the CLI and the runner without any further wiring:

  • CLI — your dataset appears alongside the built-in options in the interactive selection prompt.
  • Runner — auto-detection via registry.resolve(root) will call your plugin's supports method against the dataset root.

Registering the Adapter

The adapter must also be registered in the AdapterRegistry. Without this step, the file executor will not be able to resolve the adapter_id declared in your TopicDescriptor and the ingestion will fail.

from mosaico_alchemy.manipulation.adapters import AdapterRegistry

# AdapterRegistry is a singleton pre-populated with the built-in adapters.
# Registering here makes your adapter visible everywhere the registry is used.
registry = AdapterRegistry()
registry.register(MyDatasetCustomAdapter)

After registering both components, your dataset plugin is ready to be executed.