Skip to main content

Manipulation

The Manipulation Pack is a curated collection of heterogeneous open-source robotic manipulation datasets, each individually studied, analyzed, and mapped into Mosaico's unified semantic ontology. Every dataset was manually inspected to identify its internal topics and source formats, from HDF5 to Parquet, from TensorFlow Records to ROS bags.

The result is a single, ready-to-use ingestion suite where a RobotJoint topic originating from a ROS bag looks and behaves exactly like a RobotJoint topic coming from a DeepMind TFRecord. This is the core value proposition: proving that Mosaico acts as the universal standard for semantic sensor data description across deeply fragmented ecosystems.

Installation

Clone the repository, then install and run the project with Poetry:

git clone git@github.com:mosaico-labs/mosaico-alchemy.git
cd mosaico-alchemy

# Install via Poetry
poetry install
eval $(poetry env activate)

Note: Requires Python 3.11 or higher.

To test the installation, from your terminal use the following command:

mosaico-alchemy manipulation --help

To start ingesting the data, download the datasets from the related repositories (see Supported Datasets) and then run the mosaico-alchemy manipulation command followed by your dataset directories:

mosaico-alchemy manipulation --datasets /path/to/dataset

Configuration Options

The CLI supports the following flags to control the execution environment:

OptionDefaultDescription
--datasetsRequiredOne or more space-separated dataset roots to ingest.
--hostlocalhostThe hostname of your Mosaico Server.
--port6726The Flight port of your Mosaico Server.
--log-levelINFOSet verbosity (DEBUG, INFO, WARNING, ERROR).
--write-modesyncTopic execution mode for file-backed data (sync or async).

Supported Datasets

We provide built-in support for multiple open-source formats. We recommend exploring them in the following order to understand the offered capabilities:

Reassemble

4,551 contact-rich assembly and disassembly demonstrations across 17 objects, with multimodal sensing from event cameras, force-torque sensors, microphones and multi-view RGB cameras.

Execution Backend

File (HDF5)

Downloading the Dataset

Download and extract the archive into your dataset root directory:

cd <dataset_dir>
# From the dataset root directory
curl -L "https://researchdata.tuwien.ac.at/records/0ewrv-8cb44/files/data.zip?download=1" \
-o reassemble_data.zip && \
unzip reassemble_data.zip && \
rm reassemble_data.zip

The dataset directory structure will look like:

<dataset_dir>/
└── ...
└── reassemble_data/

Dataset Properties

PropertyValue
Episodes4,551 total (4,035 successful, 516 failed)
RobotFranka Emika arm with parallel-jaw gripper
Task typeContact-rich assembly and disassembly of connectors
Objects17 object types (e.g. Ethernet cable, USB-A, HDMI)
AnnotationsHierarchical: high-level task segments + low-level primitives (Grasp, Approach, Align, Lift, Insert), each with timestamps and a success flag
Source formatHDF5 (.h5), one file per episode
ModalitiesEvent camera (DAVIS346), 3× RGB cameras, 3× microphones, robot proprioception (7 joints), gripper state, force/torque sensor
Control frequency~1 kHz robot state; ~30 Hz RGB cameras; ~120 Hz event camera

The episode structure is not step-based: each .h5 file stores independently timestamped streams that must be aligned during loading. The segments_info field provides the ground-truth temporal segmentation of the episode into labelled high-level and low-level action phases.

Topics Ingested into Mosaico

TopicOntology TypeDescription
/capture_node-camera-imageCompressedImageRGB video from the DAVIS346 event camera's integrated frame sensor, facing the workspace.
/eventsEventCamera*Asynchronous pixel-level brightness change events [x, y, polarity] from the event camera.
/hama1CompressedImageRGB video from the first external camera observing the robot.
/hama1_audioAudioDataStamped*Audio stream from the microphone co-located with the first external camera.
/hama2CompressedImageRGB video from the second external camera observing the robot.
/hama2_audioAudioDataStamped*Audio stream from the microphone co-located with the second external camera.
/handCompressedImageRGB video from the hand-mounted camera observing the worktable.
/hand_audioAudioDataStamped*Audio stream from the microphone co-located with the hand camera.
/grasp_failure_labelSegmentInfo*Hierarchical temporal segmentation of the episode into labelled high-level and low-level action segments, each with start/end timestamps and a success flag.
/robot_state/joint_stateRobotJointPosition, velocity, and effort for each of the 7 robot arm joints.
/robot_state/posePoseEnd-effector Cartesian pose [x, y, z, qx, qy, qz, qw] in world frame.
/robot_state/velocityVelocityEnd-effector Cartesian velocity [vx, vy, vz, wx, wy, wz] in m/s and rad/s.
/robot_state/compensated_base_force_torqueForceTorque3D force and torque at the robot base, gravity-compensated.
/robot_state/measured_force_torqueForceTorqueRaw 3D force and torque measured at the end-effector F/T sensor.
/robot_state/end_effectorEndEffector*Position, velocity, and effort for each of the 2 gripper fingers.

*The custom ontology models defined within the pack module

RT-1 (Fractal)

87,212 pick-and-place episodes across 17 objects in Google micro kitchen environments, with RGB observations, natural language instructions, 512D task embeddings and full end-effector action space.

Execution Backend

File (TFDS)

Downloading the Dataset

Install the Google Cloud SDK (required for gcloud) and authenticate your account:

brew install --cask google-cloud-sdk
gcloud auth login

Then download the dataset into your dataset root directory:

cd <dataset_dir>
# From the dataset root directory
mkdir -p fractal_data

gcloud storage cp -r \
gs://gresearch/robotics/fractal20220817_data/0.1.0 \
./fractal_data/

The dataset directory structure will look like:

<dataset_dir>/
└── ...
└── fractal_data/
└── 0.1.0/

Dataset Properties

PropertyValue
Episodes87,212 total; ~4.43% failure rate (episodes where is_terminal = true and no positive reward was observed)
RobotGoogle RT-1 mobile manipulator with 7-DoF arm
Task typePick-and-place in kitchen micro-environments
Objects17 object families (snacks, containers, drawers, etc.)
AnnotationsPer-step scalar reward; episode-level aspects and attributes metadata (unpopulated in the public release)
Source formatTensorFlow Dataset (TFDS / TFRecord), split into train shards
ModalitiesSingle RGB camera (256×320×3), natural language instruction, 512-dimensional task embedding
Control frequency~3 Hz

The public release omits the aspects and attributes fields (all values are UNSPECIFIED). Episode success must therefore be inferred indirectly: an episode is considered failed when is_terminal = true on the last step and no step carries a positive reward. The action space covers end-effector displacement (world_vector), orientation change (rotation_delta), gripper closure, and base navigation in addition to the termination signal.

Topics Ingested into Mosaico

TopicOntology TypeDescription
step/observation/imageCompressedImageRGB camera frame (256×320×3) from the robot's onboard camera.
step/observation/base_pose_tool_reachedPoseCurrent end-effector pose [x, y, z, qx, qy, qz, qw] in base-relative frame.
step/observation/orientation_startQuaternionEnd-effector orientation quaternion at the start of the episode (t=0), used to normalise subsequent rotations.
step/observation/src_rotationQuaternionReference orientation quaternion for the current phase of the task.
step/observation/natural_language_instructionStringFree-text task instruction (e.g. "pick rxbar chocolate from bottom drawer and place on counter").
step/observation/natural_language_embeddingTextEmbedding*512-dimensional float embedding of the natural language instruction.
step/observation/gripper_closedFloating32Binary-like indicator of whether the gripper is currently closed (1) or open (0).
step/observation/gripper_closedness_commandedFloating32Previously commanded continuous gripper position [0, 1]; comparing with gripper_closed reveals execution error or physical resistance.
step/observation/height_to_bottomFloating32Altitude of the end-effector above the ground plane, in metres.
step/observation/rotation_delta_to_goVector3dRemaining rotational displacement [roll, pitch, yaw] from current orientation to target, in radians.
step/observation/vector_to_goVector3dDisplacement [Δx, Δy, Δz] from the current end-effector position to the target; used as a closed-loop control signal.
step/observation/orientation_boxVector3dBounds*Min/max rotational bounds (2×3) defining the allowed orientation range for an object of interest, in radians.
step/observation/robot_orientation_positions_boxVector3dFrame*3×3 matrix describing the robot body's position and orientation in 3D space.
step/observation/workspace_boundsWorkspaceBounds*3×3 matrix defining the per-axis spatial limits within which the robot is authorised to operate.
step/action/world_vectorVector3dCommanded Cartesian displacement [Δx, Δy, Δz] of the end-effector in base-relative frame.
step/action/rotation_deltaVector3dCommanded orientation change [roll, pitch, yaw] in base-relative frame, in radians.
step/action/base_displacement_vectorVector3dCommanded robot base translation [x, y] on the horizontal plane, in metres.
step/action/base_displacement_vertical_rotationFloating32Commanded robot base yaw rotation (rotation about the vertical Z axis), in radians.
step/action/gripper_closedness_actionFloating32Target gripper closure [0, 1] commanding how tightly to grasp an object.
step/action/terminate_episodeTerminateEpisode*3-element signal indicating whether the model believes the task is complete, ongoing, or in error.
step/rewardFloating32Scalar reward for the action taken at this step.
step/is_firstBooleanTrue for the initial step of an episode.
step/is_lastBooleanTrue for the final step of an episode.
step/is_terminalBooleanTrue when the episode ends (by success or failure), triggering a reset of the expected value computation.

*The custom ontology models defined within the pack module

LeRobot DROID

95,658 manipulation episodes collected at 13 research institutions, with wrist and dual exterior stereo cameras, joint and Cartesian state, end-effector position and embedded camera extrinsics.

Execution Backend

File (Parquet)

Downloading the Dataset

Install the Hugging Face CLI and authenticate your account:

brew install huggingface-cli
hf auth login

Then download the dataset into your dataset root directory:

cd <dataset_dir>
# From the dataset root directory
hf download lerobot/droid_1.0.1 \
--repo-type dataset \
--local-dir ./droid_1.0.1 \
--max-workers 8

The dataset directory structure will look like:

<dataset_dir>/
└── ...
└── droid_1.0.1/
├── data/
├── meta/
└── videos/

Dataset Properties

PropertyValue
Episodes95,617 (LeRobot droid_1.0.1 release); 27,618,651 total frames
RobotFranka Emika arm
Task typeOpen-vocabulary manipulation collected "in-the-wild" across 564 scenes
Data collectors50 operators across 13 research institutions
AnnotationsUp to 3 natural language reformulations per episode; boolean is_episode_successful flag; scalar reward [0, 1]; collection metadata (building, collector ID, date, task category)
Source formatParquet shards (LeRobot format), one row per timestep
Modalities3× RGB video (180×320×3) — wrist + 2 external cameras; joint positions (7,); Cartesian pose (6,); gripper position (1,); camera extrinsics (6,) per camera
Control frequency15 fps

An episode is reconstructed from Parquet rows by grouping on episode_index, ordered by frame_index. The dataset exposes both joint-space and Cartesian-space representations of state and action simultaneously, making it suitable for policies trained in either control space. Note that some episodes carry empty language instruction strings; filtering by language_instruction != "" is recommended for language-conditioned training.

Topics Ingested into Mosaico

TopicOntology TypeDescription
/observation/images/wrist_leftCompressedImageRGB video (180×320×3) from the wrist-mounted camera near the end-effector.
/observation/images/exterior_1_leftCompressedImageRGB video (180×320×3) from the first external scene camera.
/observation/images/exterior_2_leftCompressedImageRGB video (180×320×3) from the second external scene camera.
/observation/state/joint_positionRobotJointCurrent positions of the 7 Franka arm joints, in radians.
/observation/state/cartesian_positionPoseCurrent end-effector Cartesian pose [x, y, z, roll, pitch, yaw] in world frame.
/observation/state/gripper_positionEndEffector*Current gripper opening/closing state as a scalar.
/action/joint_positionRobotJointTarget positions commanded to the 7 Franka arm joints.
/action/cartesian_positionPoseTarget end-effector Cartesian pose [x, y, z, roll, pitch, yaw] commanded at this step.
/action/cartesian_velocityVelocityTarget end-effector Cartesian velocity [vx, vy, vz, vroll, vpitch, vyaw] commanded at this step.
/action/gripper_positionEndEffector*Target gripper position commanded at this step.
/camera_extrinsics/wrist_leftPoseExtrinsic pose [x, y, z, roll, pitch, yaw] of the wrist camera relative to the robot.
/camera_extrinsics/exterior_1_leftPoseExtrinsic pose [x, y, z, roll, pitch, yaw] of the first external camera.
/camera_extrinsics/exterior_2_leftPoseExtrinsic pose [x, y, z, roll, pitch, yaw] of the second external camera.
/step/rewardFloating64Scalar reward for the action taken at this step, ranging from 0 to 1.
/step/discountFloating64Discount factor applied to future rewards during training.
/step/task_indexInteger64Numeric index identifying the task type for this episode.
/step/frame_indexInteger64Positional index of this step within its episode.
/step/is_firstBooleanTrue for the initial step of an episode.
/step/is_lastBooleanTrue for the final stored step of an episode.
/step/is_terminalBooleanTrue when the episode ends by reaching a terminal state.

*The custom ontology models defined within the pack module

Multimodal Manipulation Learning

300 ROS bag recordings of a Kuka IIWA robot with Allegro hand, combining Tekscan tactile pressure, microphone audio, torque commands and joint states across 5 material classes.

Execution Backend

ROS .bag

Downloading the Dataset

Download and extract the archive into your dataset root directory:

cd <dataset_dir>
curl -L "https://zenodo.org/records/6372438/files/annotated_bags_mml.zip?download=1" \
-o mml_data.zip && \
unzip mml_data.zip && \
rm mml_data.zip

The dataset directory structure will look like:

<dataset_dir>/
└── ...
└── mml_data/

Dataset Properties

PropertyValue
Episodes300 (one .bag file per episode)
RobotKUKA iiwa 7-DoF arm + Allegro 16-DoF dexterous hand
Task typeObject shaking to classify contents by sound and touch
Motion familiesvertical (150 episodes), rotation (150 episodes)
Object classescornflakes, empty, gummies, rice, vitamins — 30 episodes per combination
AnnotationsTrial metadata via /trialInfo YAML (motion parameters, controller gains); discrete event markers for shake start/stop
Source formatROS1 .bag, naming convention YYYYMMDD_<motion>_<object>_<trial_idx>.bag
ModalitiesKUKA iiwa joint states (7,), Allegro hand joint states (16,), end-effector pose, torque commands (7,), Tekscan tactile pressure (46×64), mono audio (16 kHz MP3)
Recording dates2021-08-25 (150 bags), 2021-08-26 (90 bags), 2021-09-13 (60 bags)

There is no RGB camera stream in this dataset. The primary sensing modalities are tactile (Tekscan pressure matrix) and acoustic (microphone). The 46×64 grid shape for the tactile sensor is inferred from the 2,944-element flat array; the ROS layout field is empty and does not declare the geometry explicitly.

Topics Ingested into Mosaico

TopicOntology TypeDescription
/allegro_hand_right/joint_statesRobotJointPosition, velocity, and effort for the 16 joints of the Allegro dexterous hand.
/iiwa/joint_statesRobotJointPosition, velocity, and effort for the 7 joints of the KUKA iiwa arm.
/iiwa/eePosePoseEnd-effector Cartesian pose [x, y, z, qx, qy, qz, qw] in world frame.
/iiwa/TorqueController/commandJointTorqueCommand*Torque-space control command sent to the 7 iiwa joints at each control cycle, in Nm.
/tekscan/frameTekscanSensor*Tactile pressure frame from the Tekscan sensor, represented as a 2944-element array interpretable as a 46×64 pressure matrix.
/audio/audioAudioDataStamped*Raw mono audio chunks captured during the manipulation trial.
/audio/audio_infoAudioInfo*Audio stream metadata: sample rate (16 kHz), format (S16LE), and codec (MP3).
/trialInfoStringTrial metadata and discrete event markers: the first message contains a YAML block with motion parameters and controller configuration; subsequent messages mark events such as shake start or stop.

*The custom ontology models defined within the pack module

A Note about the Custom Ontology Models

Some datasets include application-specific data types that do not map cleanly to Mosaico’s built-in ontology models; for example, the TerminateEpisode model is highly specific to the data format defined in the Fractal dataset. Defining custom ontology models lets you represent these data structures precisely, with typed fields, validation rules, and consistent semantics for your own sensors or proprietary message formats.

Mosaico provides a fast path to define and automatically register custom data models. Once registered, they behave like native models and are accepted by the platform without extra integration work.

Extending the Pack

Extending the pack means contributing to the project. The recommended workflow is to fork the repository, add your dataset as a self-contained module following the structure below, wire it into the shared registries, and open a pull request.

The source tree for manipulation datasets and adapters is:

src/mosaico_alchemy/manipulation/
├── adapters/
│ ├── __init__.py ← built-in registrations
│ ├── registry.py ← AdapterRegistry definition
│ ├── droid/
│ ├── fractal_rt1/
│ ├── mml/
│ ├── reassemble/
│ └── my_dataset/ ← add your adapter module here
│ └── __init__.py ← def register_adapters():... here
│ └── <adapters>
├── datasets/
│ ├── __init__.py ← built-in registrations
│ ├── registry.py ← DatasetRegistry definition
│ ├── droid/
│ ├── fractal_rt1/
│ ├── mml/
│ ├── reassemble/
│ └── my_dataset/ ← add your dataset plugin module here
│ └── __init__.py ← def register_plugin():... here
│ └── plugin.py ← class MyDatasetPlugin:... here
...

By following this guide, you will:

  • Select an Execution Backend: decide whether to use a file-backed SequenceDescriptor or a native RosbagSequenceDescriptor.
  • Define Ontologies and Adapters: choose the Mosaico types that model your sensor streams and implement the adapters that translate raw payloads into them.
  • Implement the Dataset Plugin: assemble the ingestion plan that wires your iterators, ontologies, and adapters into a declarative sequence descriptor.
  • Register Your Components: expose your plugin and adapters to the CLI by hooking into the shared registries.

Choose the Execution Backend

Your first architectural decision is deciding how the data should be accessed. If your dataset relies on standard file and database formats such as HDF5, Parquet, JSON, or custom binary files, you will utilize the file-backed executor by generating a SequenceDescriptor.

Alternatively, if your dataset consists of native ROS bags and maps naturally to ROS topics, you should use a RosbagSequenceDescriptor. This allows the plugin to validate the required topics and immediately delegate the demanding ingestion effort directly to the ROS bridge.

Apply Ontologies and Implement Adapters

For each data stream, you define how it maps to a Mosaico ontology type. If you cannot reuse an existing SDK ontology, you can define your own custom class. We highly recommend reviewing the Ontology Customization Guide to see exactly how to write and register your own custom data models.

As an example, assuming that your dataset contains a "video_frame" data stream representing a stream of JPEG images, you must define how the payload of each message in such a data stream maps with the mosaico standard CompressedImage ontology model. This can be done by defining an adapter.

adapters/my_dataset/video_frame.py
from mosaicolabs import Message, CompressedImage
from mosaicolabs.packs.manipulation.adapters.base import BaseAdapter

# Adapters are generic on the ontology type they produce.
# Specialising BaseAdapter[CompressedImage] binds the translate() return type
# and restricts this adapter to messages carrying CompressedImage payloads only.
class MyDatasetVideoFrameAdapter(BaseAdapter[CompressedImage]):
# The global identifier that will be referenced by name in your `TopicDescriptor`.
adapter_id = "mydataset.video_frame"
# Define the fields required in the payload, to be validated
# in the `translate` function.
_REQUIRED_KEYS: tuple[str, ...] = ("timestamp","image")

# The translator factory: an adapter receives the raw payload dictionary
# and returns an instantiated Mosaico `Message`.
@classmethod
def translate(cls, payload: dict) -> Message:
"""
Translates a raw custom dictionary into a Mosaico Message container.
"""
# Guard against malformed payloads before any field access:
# joint_positions must be present and contain exactly 7 elements.
cls._validate_payload(payload=payload)

return Message(
# Conversion from arbitrary time structures into native
# Mosaico nanosecond formats.
timestamp_ns=int(payload["timestamp"] * 1e9),
data=CompressedImage(
data=payload["image"],
format=ImageFormat.JPEG,
),
)

Such a class must be defined for each data dispatched with

Develop the Dataset Plugin

With adapters in place, you implement the dataset plugin class. This class satisfies a straightforward internal DatasetPlugin protocol to detect your dataset format, discover its logical sequences, and assemble the ingestion plan that wires everything together.

Dataset-Plugin Protocol
class DatasetPlugin(Protocol):
# A unique string identifier used in downstream logging and operator prompts.
dataset_id: str

# A method to verify if a given folder matches the dataset signature
# (e.g., checking for `*.h5` files) avoiding expensive full-dataset scans.
def supports(self, root: Path) -> bool: ...
# A method to discover the logical sequences contained in the root folder,
# such as individual robot episodes.
def discover_sequences(self, root: Path) -> Iterable[Path]: ...
# The core logic to create an ingestion plan for each sequence,
# returning an `IngestionDescriptor`.
def create_ingestion_plan(self, sequence_path: Path) -> IngestionDescriptor: ...

To keep your plugin code clean and maintainable, raw file I/O operations must be completely separated from the orchestration step. You should create dedicated iterator functions following the factory pattern: each function accepts static configuration parameters (file paths, field names) and returns a Callable[[Path], Iterable[dict]]. The runner calls that callable later with the actual sequence path to stream the raw payloads. Each payload dict must include a "timestamp" key expressed in seconds as a float.

datasets/my_dataset/iterators.py
def iter_video_frames(video_path: str, timestamps_path: str) -> Callable[[Path], Iterable[dict]]:
def _fn(sequence_path: Path) -> Iterable[dict]:
# open sequence_path and read the data at video_path / timestamps_path
yield {"timestamp": 1234.56, "image": b"..."}
return _fn

def count_video_frames(timestamps_path: str) -> Callable[[Path], int]:
def _fn(sequence_path: Path) -> int:
# count the number of frames at timestamps_path
return total_frames
return _fn

Structuring the Ingestion Plan

The plugin implements create_ingestion_plan to declare the sequence name, its metadata, and the full list of topics. Each topic references the iterators and adapters you built in the previous steps.

datasets/my_dataset/plugin.py
class MyDatasetPlugin:
# ....
def create_ingestion_plan(self, sequence_path: Path) -> SequenceDescriptor:
return SequenceDescriptor(
# The unique name of the target sequence being constructed.
sequence_name=f"{self.dataset_id}_{sequence_path.stem}",
# The custom metadata of this sequence.
sequence_metadata={
"dataset_id": self.dataset_id,
"ingestion_backend": "file",
},
topics=[
TopicDescriptor(
# The topic name that will handle this data stream.
topic_name="/camera/front",
# The Mosaico ontology type that models this data stream.
ontology_type=CompressedImage,
#The identifier of the adapter responsible for translating this specific topic.
adapter_id=f"{self.dataset_id}.video_frame",
# The runner calls this callable later with the sequence path
# to obtain the raw payload stream.
payload_iter=iter_video_frames("path/to/video", "path/to/timestamps"),
# The runner uses this to count the total messages ahead of time for progress
# reporting. It must match the number of items `payload_iter` will yield.
message_count=count_video_frames("path/to/timestamps"),
),
# ... Other descriptors here
],
)

Register Your Components

Once your dataset plugin, functional iterators, and adapters are implemented, they must be made discoverable to the CLI layer. Both DatasetRegistry and AdapterRegistry are singletons. Plugin and adapter registration are kept intentionally separate: each registry has its own __init__.py that lists its registrations as a flat, explicit call sequence.

Registering the Adapters

Inside your adapter module, define a register_adapters() function that registers every adapter for your dataset into the singleton:

adapters/my_dataset/__init__.py
from mosaico_alchemy.manipulation.adapters.registry import AdapterRegistry
from .video_frame import MyDatasetVideoFrameAdapter
from .joint_state import MyDatasetJointAdapter

def register_adapters() -> None:
registry = AdapterRegistry()
registry.register(MyDatasetVideoFrameAdapter)
registry.register(MyDatasetJointAdapter)

Then call it from adapters/__init__.py, alongside the other built-in datasets:

adapters/__init__.py
from mosaico_alchemy.manipulation.adapters import droid, fractal_rt1, reassemble
from mosaico_alchemy.manipulation.adapters import my_dataset # add this import

droid.register_adapters()
fractal_rt1.register_adapters()
reassemble.register_adapters()

my_dataset.register_adapters() # <- add this call

ROS-based datasets (such as MML) delegate ingestion entirely to the ROS bridge, which resolves topics directly from the bag — no adapter registration is required for them.

Registering the Dataset Plugin

Inside your dataset module, define a register_plugin() function that registers your plugin into the singleton:

datasets/my_dataset/__init__.py
from mosaico_alchemy.manipulation.datasets.registry import DatasetRegistry
from .plugin import MyDatasetPlugin

def register_plugin() -> None:
DatasetRegistry().register(MyDatasetPlugin())

Then call it from datasets/__init__.py, alongside the other built-in datasets:

datasets/__init__.py
from mosaico_alchemy.manipulation.datasets import droid, fractal_rt1, mml, reassemble
from mosaico_alchemy.manipulation.datasets import my_dataset # add this import

droid.register_plugin()
fractal_rt1.register_plugin()
mml.register_plugin()
reassemble.register_plugin()

my_dataset.register_plugin() # <- add this call

Adding a new dataset to the pack is therefore a two-line change in each __init__.py — one import and one call — with all registration logic contained inside the dataset's own module.

Once registered:

  • CLI — your dataset appears alongside the built-in options in the interactive selection prompt.
  • Runner — auto-detection via registry.resolve(root) will call your plugin's supports method against the dataset root, with no further wiring required.