A Compute Poolhouse for AI and ML

The simplest and most powerful ML platform. Add all of your available compute and Runhouse will distribute scale, and optimize your ML workloads.

Try it out

Example

Simple yet Powerful for Developers

Runhouse makes it extremely easy to deploy research, training, and inference code written in normal Python on elastic compute or Kubernetes.

Runhouse works with your training loop or inference pipelines as-is, with no decorators, repackaging, or DSLs.
Compute launch and execution are all in Python so you can use best practice software DevOps on your code and pipelines.

Let's Chat!

Thoughtful Management and Optimization

Runhouse is the easiest way to just go faster with no additional infrastructure overhead or platforms team lift.

Scale from single GPU to distributed training with a single line of code.
Manage compute and execution with full visibility and control, including direct SSH access into the distributed clusters.
Workloads launch and scale within your own cloud accounts.

Get Started

Runhouse Github

Github

Runhouse is a flexible framework for building a true ML Platform.

All of Your Compute Sources, Unified

Gather and manage all your sources of ML compute in one place. Developers shouldn't have to think about infra at all, while Platforms teams can set uniform rules across all compute.

# Launch from elastic compute
aws_secret = rh.provider_secret("aws") # AWS service account
lambda_secret = rh.provider_secret("lambda", values={"api_key": "lambda_key"}) # Lambda Labs api key

# Existing Kubernetes clusters
kube_config = rh.provider_secret(provider="kubernetes", path="~/.kube/config")

# Or VMs
ssh_secret = rh.provider_secret(provider="ssh", name="on_prem_compute")

Launch Multi-Node Clusters Programmatically

Define your compute requirements and specify cloud provider, region, required resources, and more.

import runhouse as rh 

# Create a multi-node cluster 
gpus_per_node = 1
num_nodes = 16

img = rh.Image(name="runhouse-image").install_packages(
    [
        "torch==2.5.1",
        "torchvision==0.20.1",
        "Pillow==11.0.0",
    ],
)
gpu_cluster = rh.cluster(
    name=f"rh-{num_nodes}x{gpus_per_node}-gpu",
    gpus=f"A100:{gpus_per_node}",
    num_nodes=num_nodes,
    use_spot=True, # Can use spot instances easily
    provider="aws",
    image=img,
).up_if_not()

Run PyTorch Distributed

Runhouse is the easiest way to start and robustly execute distributed trainings. Scale up with just one line of code.

import runhouse as rh

# Write a regular Python class and send it to our cluster
from resnet_training import ResNet152Trainer
remote_trainer_class = rh.module(ResNet152Trainer).to(
    cluster
)

# Instantiate a remote instance and call .distribute() to setup PyTorch
remote_trainer = remote_trainer_class().distribute(
    distribution="pytorch",
    replicas_per_node=gpus_per_node,
    num_replicas=gpus_per_node * num_nodes,
)

# Launch from elastic compute
aws_secret = rh.provider_secret("aws") # AWS service account
lambda_secret = rh.provider_secret("lambda", values={"api_key": "lambda_key"}) # Lambda Labs api key

# Existing Kubernetes clusters
kube_config = rh.provider_secret(provider="kubernetes", path="~/.kube/config")

# Or VMs
ssh_secret = rh.provider_secret(provider="ssh", name="on_prem_compute")

Everything you need to get started with Runhouse today.

See an Example

Learn more about the technical details of Runhouse and try implementing the open-source package into your existing Python code. Here's an example of how to deploy Llama3 to EC2 in just a few lines.

See an Example

Talk to Donny (our founder)

We've been building ML platforms and open-source libraries like PyTorch for over a decade. We'd love to chat and get your feedback!

Book Time

Get in touch 👋

Whether you'd like to learn more about Runhouse or need a little assistance trying out the product, we're here to help.

Email

team@run.house

Discord

Join the convo

A Compute Poolhouse for AI and ML

Simple yet Powerful for Developers

Thoughtful Management and Optimization

Runhouse Github

ResNet Training

ResNet with Lightning

Ray Hyperparameter Opt

Dask LightGBM

Runhouse is a flexible framework for building a true ML Platform.

All of Your Compute Sources, Unified

Launch Multi-Node Clusters Programmatically

Run PyTorch Distributed

Everything you need to get started with Runhouse today.

See an Example

Talk to Donny (our founder)

Get in touch 👋

Email

Discord

Twitter

A Compute Poolhouse for AI and ML

Simple yet Powerful for Developers

Thoughtful Management and Optimization

Runhouse Github

ResNet Training

ResNet with Lightning

Ray Hyperparameter Opt

Dask LightGBM

Runhouse is a flexible framework for building a true ML Platform.

All of Your Compute Sources, Unified

Launch Multi-Node Clusters Programmatically

Run PyTorch Distributed

Everything you need to get started with Runhouse today.

See an Example

Talk to Donny (our founder)

Get in touch 👋

Email

Discord

Twitter

Everything you need to get started with Runhouse today.