A Compute Poolhouse for AI and ML
The simplest and most powerful ML platform. Add all of your available compute and Runhouse will distribute scale, and optimize your ML workloads.
Simple yet Powerful for Developers
Runhouse makes it extremely easy to deploy research, training, and inference code written in normal Python on elastic compute or Kubernetes.
- Runhouse works with your training loop or inference pipelines as-is, with no decorators, repackaging, or DSLs.
- Compute launch and execution are all in Python so you can use best practice software DevOps on your code and pipelines.
Thoughtful Management and Optimization
Runhouse is the easiest way to just go faster with no additional infrastructure overhead or platforms team lift.
- Scale from single GPU to distributed training with a single line of code.
- Manage compute and execution with full visibility and control, including direct SSH access into the distributed clusters.
- Workloads launch and scale within your own cloud accounts.
Runhouse Github
GithubRunhouse is a flexible framework for building a true ML Platform.
All of Your Compute Sources, Unified
Gather and manage all your sources of ML compute in one place. Developers shouldn't have to think about infra at all, while Platforms teams can set uniform rules across all compute.
# Launch from elastic compute aws_secret = rh.provider_secret("aws") # AWS service account lambda_secret = rh.provider_secret("lambda", values={"api_key": "lambda_key"}) # Lambda Labs api key # Existing Kubernetes clusters kube_config = rh.provider_secret(provider="kubernetes", path="~/.kube/config") # Or VMs ssh_secret = rh.provider_secret(provider="ssh", name="on_prem_compute")
Launch Multi-Node Clusters Programmatically
Define your compute requirements and specify cloud provider, region, required resources, and more.
import runhouse as rh # Create a multi-node cluster gpus_per_node = 1 num_nodes = 16 img = rh.Image(name="runhouse-image").install_packages( [ "torch==2.5.1", "torchvision==0.20.1", "Pillow==11.0.0", ], ) gpu_cluster = rh.cluster( name=f"rh-{num_nodes}x{gpus_per_node}-gpu", gpus=f"A100:{gpus_per_node}", num_nodes=num_nodes, use_spot=True, # Can use spot instances easily provider="aws", image=img, ).up_if_not()
Run PyTorch Distributed
Runhouse is the easiest way to start and robustly execute distributed trainings. Scale up with just one line of code.
import runhouse as rh # Write a regular Python class and send it to our cluster from resnet_training import ResNet152Trainer remote_trainer_class = rh.module(ResNet152Trainer).to( cluster ) # Instantiate a remote instance and call .distribute() to setup PyTorch remote_trainer = remote_trainer_class().distribute( distribution="pytorch", replicas_per_node=gpus_per_node, num_replicas=gpus_per_node * num_nodes, )
# Launch from elastic compute aws_secret = rh.provider_secret("aws") # AWS service account lambda_secret = rh.provider_secret("lambda", values={"api_key": "lambda_key"}) # Lambda Labs api key # Existing Kubernetes clusters kube_config = rh.provider_secret(provider="kubernetes", path="~/.kube/config") # Or VMs ssh_secret = rh.provider_secret(provider="ssh", name="on_prem_compute")
Everything you needโจto get started with Runhouse today.
ย ย See an Example
Learn more about the technical details of Runhouse and try implementing the open-source package into your existing Python code. Here's an example of how to deploy Llama3 to EC2 in just a few lines.
See an Exampleย ย Talk to Donny (our founder)
We've been building ML platforms and open-source libraries like PyTorch for over a decade. We'd love to chat and get your feedback!
Book TimeGet in touch ๐
Whether you'd like to learn more about Runhouse or need a little assistance trying out the product, we're here to help.