Serverless ML in Your Own Cloud
Launch ephemeral clusters and dispatch your ML pipelines as regular Python. Iterable, debuggable, DSL-free, ready for distributed, and infrastructure agnostic.
Launch powerful compute
Developers define and launch ephemeral clusters from Kubernetes, elastic compute, bare metal, or a mixture.
import runhouse as rh # Define a cluster that you want to launch - # here, we launch a 4 x 4 GPU cluster from EC2 my_cluster = rh.cluster( name="rh-a100s", instance_type="A100:4", num_nodes = 4, memory="32+", provider="aws", image = rh_img, autostop_mins=60, ).up_if_not() # Save and reuse the cluster across multiple pipeline # steps, pipelines, or to ensure reproducibility. my_cluster.save() # Later... load clusters by name for reuse my_cluster = rh.cluster(name="rh-a10x").up_if_not()
Fast and iterable development
Build ML pipelines interactively on powerful remote compute with local-like calls and logs. Dispatch and execute code in under 5 seconds.
# Define your model class with normal code class MyModelClass: def train(): .. def predict(): .. def save(): .. # Send your class to remote RemoteClass = rh.module(MyModelClass).to(my_cluster) # Instantiate, call an instance of the remote class RemoteModel = RemoteClass(name = 'remote_model') RemoteModel.train() RemoteModel.save()
Break the barrier between research and production
Identical, fault-tolerant execution everywhere with no "translation" for production. Manage your ML lifecycle with software best practices.
$ git add MyModelClass.py $ git commit -m "Refactor the train method" $ git push $ echo "Develop code, not orchestrator pipelines!"
Easily adopt distributed frameworks
Scale without an infrastructure lift. Automatically set up clusters for Ray, PyTorch, Dask, and other distributed frameworks.
# Run distributed data processing over large datasets process_data = rh.function(DaskPreprocess).to(rh_8x4_cpus) process_data.distribute('dask') process_data(s3_path) # Execute a multinode PyTorch or Lightning training trainer = rh.module(PyTorchTraining).to(rh_8x4_gpus) remote_trainer=trainer().distribute('pytorch') remote_trainer.train() # Use Ray Data for inference once the training is done batch_inference = rh.function(RayBatchInf).to(rh_4x1gpus) batch_inference.distribute('ray') batch_inference(s3_path)
Observability and management, out of the box
Define quotas, set autostop, audit resource access, observe execution with persisted logs, track GPU/CPU/memory utilization and more.
# API route to fetch logs for a resource @router.get( "/{uri}/logs", response_description="Resource logs retrieved", ) @send_event async def resource_logs_preview(...): ... # API route to fetch cluster status and metrics @router.get( "/{uri}/cluster/status", response_description="Cluster status retrieved", ) @send_event async def load_cluster_status(...): ...
import runhouse as rh # Define a cluster that you want to launch - # here, we launch a 4 x 4 GPU cluster from EC2 my_cluster = rh.cluster( name="rh-a100s", instance_type="A100:4", num_nodes = 4, memory="32+", provider="aws", image = rh_img, autostop_mins=60, ).up_if_not() # Save and reuse the cluster across multiple pipeline # steps, pipelines, or to ensure reproducibility. my_cluster.save() # Later... load clusters by name for reuse my_cluster = rh.cluster(name="rh-a10x").up_if_not()
Runhouse
Effortlessly program powerful ML systems across arbitrary compute in regular Python.
Works with your stack
Easily integrate within your existing pipelines, code, and development workflows.
$
pip install runhouse
Loved by research and infra teams alike
Runhouse is built for end-to-end ML development. Dispatch work quickly during local development in notebooks or IDEs, but run as-is inside Kubernetes, CI, or your favorite orchestrator. No more push and pray.
Runs inside your own infrastructure
Execution stays inside your cloud(s), with total flexibility to cost-optimize or scale to new providers.
Use Cases
ML that Runs
An ML platform that improves developer experience while increasing development velocity.
Without Runhouse:
Research is launched on siloed compute, sampled data, and notebook code to enable iterative development. Production is reached via a slow translation to orchestrators, and becomes difficult to debug when errors arise.
With Runhouse: Fast Software Development
Code is written and executed identically in research and production. Errors can be debugged on a branch from local IDEs and merged into production using a standard development lifecycle.
Operationalize your ML as a living stack.
Try it outSearch & Sharing
Runhouse Den provides you with an interface to explore your ML artifacts. Easily share with your teammates, search through your resources, and view details all in one place.
Observability
View cluster status, automatically persist logs, track GPU/CPU/memory utilization, and enable more efficient debugging for your team. Gain insights with trends and simple dashboards.
Auth & Access Control
Den makes it easy to control access to your services. Provide individual teammates with read or write access, or choose "public" visibility to expose public API endpoints.