Serverless ML in Your Own Cloud

Launch ephemeral clusters and dispatch your ML pipelines as regular Python. Iterable, debuggable, DSL-free, ready for distributed, and infrastructure agnostic.

Any cloud & Kubernetes
IDEs, Orchestrators, or CI
Built for distributed workloads
Infra observability & management

Launch powerful compute

Developers define and launch ephemeral clusters from Kubernetes, elastic compute, bare metal, or a mixture.

import runhouse as rh # Define a cluster that you want to launch - # here, we launch a 4 x 4 GPU cluster from EC2 my_cluster = rh.cluster( name="rh-a100s", instance_type="A100:4", num_nodes = 4, memory="32+", provider="aws", image = rh_img, autostop_mins=60, ).up_if_not() # Save and reuse the cluster across multiple pipeline # steps, pipelines, or to ensure reproducibility. my_cluster.save() # Later... load clusters by name for reuse my_cluster = rh.cluster(name="rh-a10x").up_if_not()

Fast and iterable development

Build ML pipelines interactively on powerful remote compute with local-like calls and logs. Dispatch and execute code in under 5 seconds.

# Define your model class with normal code class MyModelClass: def train(): .. def predict(): .. def save(): .. # Send your class to remote RemoteClass = rh.module(MyModelClass).to(my_cluster) # Instantiate, call an instance of the remote class RemoteModel = RemoteClass(name = 'remote_model') RemoteModel.train() RemoteModel.save()

Break the barrier between research and production

Identical, fault-tolerant execution everywhere with no "translation" for production. Manage your ML lifecycle with software best practices.

$ git add MyModelClass.py $ git commit -m "Refactor the train method" $ git push $ echo "Develop code, not orchestrator pipelines!"

Easily adopt distributed frameworks

Scale without an infrastructure lift. Automatically set up clusters for Ray, PyTorch, Dask, and other distributed frameworks.

# Run distributed data processing over large datasets process_data = rh.function(DaskPreprocess).to(rh_8x4_cpus) process_data.distribute('dask') process_data(s3_path) # Execute a multinode PyTorch or Lightning training trainer = rh.module(PyTorchTraining).to(rh_8x4_gpus) remote_trainer=trainer().distribute('pytorch') remote_trainer.train() # Use Ray Data for inference once the training is done batch_inference = rh.function(RayBatchInf).to(rh_4x1gpus) batch_inference.distribute('ray') batch_inference(s3_path)

Observability and management, out of the box

Define quotas, set autostop, audit resource access, observe execution with persisted logs, track GPU/CPU/memory utilization and more.

# API route to fetch logs for a resource @router.get( "/{uri}/logs", response_description="Resource logs retrieved", ) @send_event async def resource_logs_preview(...): ... # API route to fetch cluster status and metrics @router.get( "/{uri}/cluster/status", response_description="Cluster status retrieved", ) @send_event async def load_cluster_status(...): ...
import runhouse as rh # Define a cluster that you want to launch - # here, we launch a 4 x 4 GPU cluster from EC2 my_cluster = rh.cluster( name="rh-a100s", instance_type="A100:4", num_nodes = 4, memory="32+", provider="aws", image = rh_img, autostop_mins=60, ).up_if_not() # Save and reuse the cluster across multiple pipeline # steps, pipelines, or to ensure reproducibility. my_cluster.save() # Later... load clusters by name for reuse my_cluster = rh.cluster(name="rh-a10x").up_if_not()

Runhouse

Effortlessly program powerful ML systems across arbitrary compute in regular Python.

Works with your stack

Easily integrate within your existing pipelines, code, and development workflows.

$pip install runhouse

Loved by research and infra teams alike

Runhouse is built for end-to-end ML development. Dispatch work quickly during local development in notebooks or IDEs, but run as-is inside Kubernetes, CI, or your favorite orchestrator. No more push and pray.

Graphic showing a laptop with a Python logo connecting to modules and functions running on cloud compute

Runs inside your own infrastructure

Execution stays inside your cloud(s), with total flexibility to cost-optimize or scale to new providers.

List of cloud provider logos: AWS, Google, Azure, Kubernetes, AWS Lambda, and SageMaker

Use Cases

Training

OfflineOnlineDistributedHPO

Fine-tuning with LoRA

Inference

OnlineBatchLLMsMulti-step

Call Llama3 on AWS EC2

Composite Systems

RAGMulti-taskEvaluation

FastAPI RAG app with LanceDB and vLLM

Data Processing

BatchOnlineData Apps

Parallel GPU Batch Embeddings

ML that Runs

An ML platform that improves developer experience while increasing development velocity.

Line illustration showing researchers and engineers connected to an AI app or code via lines before and after changes

Without Runhouse:

Research is launched on siloed compute, sampled data, and notebook code to enable iterative development. Production is reached via a slow translation to orchestrators, and becomes difficult to debug when errors arise.

Diagram showing a block with "Code development using regular SDLC" above a smaller "Compute" block and an arrow with "Runhouse manages dispatch" between

With Runhouse: Fast Software Development

Code is written and executed identically in research and production. Errors can be debugged on a branch from local IDEs and merged into production using a standard development lifecycle.

Operationalize your ML as a living stack.

Try it out
Screenshot of a search interface with a search bar and listed resources

Search & Sharing

Runhouse Den provides you with an interface to explore your ML artifacts. Easily share with your teammates, search through your resources, and view details all in one place.

Screenshot of a chart showing GPU memory usage

Observability

View cluster status, automatically persist logs, track GPU/CPU/memory utilization, and enable more efficient debugging for your team. Gain insights with trends and simple dashboards.

Screenshot of a resource page and user access list

Auth & Access Control

Den makes it easy to control access to your services. Provide individual teammates with read or write access, or choose "public" visibility to expose public API endpoints.

Start building on a solid ML foundation.

Book a demo