Runhouse enables rapid, cost-effective machine learning development. With Runhouse, your ML code executes “serverlessly.” Dispatch Python functions and classes to any of your own cloud compute infrastructure, and call them eagerly as if they were local.
Runhouse is specifically designed for machine learning workloads — including online and offline tasks, training, and inference — where the need for heterogeneous remote compute resources is common, and flexibility is essential to keep development cycles fast and costs low.
Runhouse solves a few major problems for machine learning and AI teams:
Iterability: Developing with Runhouse feels like working locally, even if the code is executing on powerful, multi-node remote hardware. In research, avoid writing non-standard code in hosted notebooks; in production, don’t iterate by building and resubmitting pipelines. The team writes standard Python code locally, and it takes less than 2 seconds per iteration to redeploy the code to remote compute. The remote filesystem and any unaffected remote objects or functions remain accessible across iterations.
Debuggability: With Runhouse, there is perfect reproducibility between local and scheduled production execution. Research code that works is already production-ready, while any production runs that fail can be debugged locally. The combination of identical execution and fast iteration enables a straightforward, rapid debugging loop.
Cost: Organizations adopting Runhouse utilize their available compute more efficiently, leading to typical cost savings of 50%. With Runhouse, ephemeral clusters are allocated only when needed and can be launched across multiple regions or clouds based on quota or cost considerations. It’s easy to right-size instances based on workload, incorporate spot instances, and even share compute or services on compute across tasks.
Development at Scale: Adopting powerful, GPU-accelerated hardware or distributed clusters (Spark, Ray) can be disruptive. All development, debugging, automation, and deployment to occur on their runtime; for instance, users of Ray, Spark, or PyTorch Distributed must work on the head node for development. Hosted notebook services often serve as stop-gaps for this issue. Runhouse allows Python to orchestrate these systems remotely, bringing the development workflow back to standard Python.
Infrastructure Management: Runhouse captures infrastructure as code, providing a clear contract between the application and infrastructure, saving ML teams from having to learn the intricacies of networking, security, and DevOps.
A quick high-level summary of the differences between developing and deploying ML code with and without Runhouse:
Aspect | Without Runhouse | With Runhouse |
---|---|---|
Development / Research | Researchers start in hosted notebooks or SSH’ed into a cluster:
| Researchers write normal code:
|
Research to Production | Research to production happens over the course of days or weeks:
| Moving to production is instant:
|
Debugging and Updating | Production debugging is challenging:
| Easily debug or update pipelines in production:
|
You can join the Runhouse discord, or shoot us a quick note at hello@run.house