Marimo x Runhouse for Reproducible ML

Use Runhouse with marimo to enable a great development experience and reproducible execution without being limited by the scale of compute.

Photo of Paul Yang
Paul Yang

ML @ 🏃‍♀️Runhouse🏠

Published January 27, 2025
marimo and Runhouse logos

For DS/ML teams, there is strong demand for new experiments, while the scale and heterogeneity of data continue to grow. Organizations might simultaneously need to experiment with LLM fine-tuning while integrating deep learning recommender systems into user experiences. To support this fast pace of development and experimentation, you need your developer experience to be iterable, debuggable, and reproducible.

This is why marimo and Runhouse partnered. Write and edit your pipelines interactively inside marimo, a next-generation notebook that stores notebooks as pure Python code, is reproducible down to the packages, and reusable as scripts or apps. Then, scale up execution to any compute including distributed GPUs with Runhouse. It’s all just regular Python that can be reproducibly run and deployed as-is without lengthy translation work.

Traditional Jupyter Notebooks fragment the development lifecycle

Too many teams have model checkpoints that can never be reproduced or improved upon. It has been a significant anti-pattern to treat model checkpoints and notebooks as the artifacts of ML work, and the infamous statistic is just 25% of Jupyter notebooks reproduce. Notebooks are correctly criticized for their lack of easy versionability, extremely stateful execution, and simply not being regular Python code.

Compounding the problem and slowing down pushes to production, original research work is frequently done with sampled data on toy compute. Very frequently, methods don’t scale up and results don’t reproduce over full datasets. A research model trained on a single node with sampled data might need to undergo several rounds of iteration before being converted into a pipeline that runs with multiple GPUs or even multiple nodes of GPUs. This process introduces many opportunities for reproducibility failures.

Teams have mostly thrown labor hours, a smattering of vendor solutions, and what is generically termed “MLOps” at this problem. But the best ML platform is very unopinionated that lets me write regular code and magically executes it for me on powerful compute.

Perfectly reproducible execution, from research to production, for any workload

When using Runhouse with Marimo, the usage pattern is writing real Python code locally in Marimo, interactively triggering execution as if I were in an interactive shell, but dispatching execution of workloads to ephemeral remote clusters that you define in the code. Runhouse launches these clusters from your cloud provider’s elastic compute pools or from your existing Kubernetes clusters and manages them for you.

During research, you can interactively edit and modify your code, and redeploy them to remote compute for iteration in <2 seconds simply by running a Marimo cell. Your iteration loop is not hindered at all. Critically, this approach makes it trivial to move to production. All you need to do is schedule your Marimo notebook to run anywhere Python is available, and an identical process of bringing up powerful compute and dispatching work for execution happens. These notebooks are real, regular Python code that can be managed in version control and governed by best-practice software development processes.

Stay up to speed 🏃‍♀️📩

Subscribe to our newsletter to receive updates about upcoming Runhouse features and announcements.