You are viewing v0.0.12 version. Click here to see docs for the latest stable version.

Quick Start Guide

Open In Colab

This tutorials walks through Runhouse setup (installation, hardware setup, and optional login) and goes through an example that demonstrates how to user Runhouse to bridge the gap between local and remote compute, and create Resources that can be saved, reused, and shared.

Installation

Runhouse can be installed with:

!pip install runhouse

If using Runhouse with a cloud provider, you can additionally install cloud packages (e.g. the right versions of tools like boto, gsutil, etc.):

$ pip install "runhouse[aws]" $ pip install "runhouse[gcp]" $ pip install "runhouse[azure]" $ pip install "runhouse[sagemaker]" # Or $ pip install "runhouse[all]"

To import runhouse:

import runhouse as rh
# Optional: to sync over secrets from your Runhouse account # !runhouse login

Cluster Setup

Runhouse provides APIs and Secrets management to make it easy to interact with your clusters. This can be either an existing, on-prem cluster you have access to, or cloud instances that Runhouse spins up/down for you (through your own cloud account).

Note that Runhouse is NOT managed compute. Everything runs inside your own compute and storage, using your credentials.

Bring-Your-Own Cluster

If you are using an existing, on-prem cluster, no additional setup is needed. Just have your cluster IP address and path to SSH credentials or password ready:

# using private key cluster = rh.cluster( name="cpu-cluster", ips=['<ip of the cluster>'], ssh_creds={'ssh_user': '<user>', 'ssh_private_key':'<path_to_key>'}, ) # using password cluster = rh.cluster( name="cpu-cluster", ips=['<ip of the cluster>'], ssh_creds={'ssh_user': '<user>', 'password':'******'}, )

Note

For more information see the Cluster Class section.

On-Demand Cluster

For on-demand clusters through cloud accounts (e.g. AWS, Azure, GCP, LambdaLabs), Runhouse uses SkyPilot for much of the heavy lifting with launching and terminating cloud instances.

To set up your cloud credentials locally to be able to use on-demand cloud clusters, you can either:

  1. Use SkyPilot’s CLI command !sky check, which provides instructions on logging in or setting up your local config file, depending on the provider (further SkyPilot instructions here)

  2. Use Runhouse’s Secrets API to sync your secrets down into the appropriate local config.

# SkyPilot CLI !sky check
# Runhouse Secrets # Lambda Labs: rh.Secrets.save_provider_secrets(secrets={"lambda": {"api_key": "*******"}}) # AWS: rh.Secrets.save_provider_secrets(secrets={"aws": {"access_key": "******", "secret_key": "*******"}}) # GCP: !gcloud init !gcloud auth application-default login !cp -r /content/.config/* ~/.config/gcloud # Azure !az login !az account set -s <subscription_id>

To check that the provider credentials are properly configured locally, run sky check to confirm that the cloud provider is enabled

!sky check

To create a cluster instance, use the rh.cluster() factory function for an existing cluster, or rh.ondemand_cluster for an on-demand cluster. We go more in depth about how to launch the cluster, and run a function on it later in this tutorial.

cluster = rh.ondemand_cluster( name="cpu-cluster", instance_type="CPU:8", provider="cheapest", # options: "AWS", "GCP", "Azure", "Lambda", or "cheapest" ).save()

Note

For more information and hardware setup see the OnDemandCluster Class section.

SageMaker Cluster

Runhouse facilitates easy access to existing or new SageMaker compute. Just provide your SageMaker execution role ARN or have it configured in your local environment.

# Launch a new SageMaker instance and keep it up indefinitely cluster = rh.sagemaker_cluster(name='sm-cluster', profile="sagemaker").save() # Running a training job with a provided Estimator pytorch_estimator = PyTorch(entry_point='train.py', role='arn:aws:iam::123456789012:role/MySageMakerRole', source_dir='/Users/myuser/dev/sagemaker', framework_version='1.8.1', py_version='py36', instance_type='ml.p3.2xlarge') cluster = rh.sagemaker_cluster(name='sagemaker-cluster', estimator=pytorch_estimator).save()

Note

For more information and hardware setup see the SageMakerCluster Class section.

Secrets and Portability

Using Runhouse with only the OSS Python package is perfectly fine, but you can unlock some unique portability features by creating an (always free) account and saving down your secrets and/or resource metadata there.

Think of the OSS-package-only experience as akin to Microsoft Office, while creating an account will make your cloud resources sharable and accessible from anywhere like Google Docs.

For instance, if you previously set up cloud provider credentials in order for launching on-demand clusters, simply call runhouse login or rh.login() and choose which of your secrets you want to sync into your Runhouse account. Then, from any other environment, you can download those secrets and use them immediately, without needing to set up your local credentials again. To delete any local credentials or remove secrets from Runhouse, you can call runhouse logout or rh.logout().

Some notes on security:

  • Our API servers only ever store light metadata about your resources (e.g. folder name, cloud provider, storage bucket, path). All actual data and compute stays inside your own cloud account and never hits our servers.

  • Secrets are stored in Hashicorp Vault (an industry standard for secrets management), never on our API servers, and our APIs simply call into Vault’s APIs.

!runhouse login # or rh.login()

Getting Started Example

In the following example, we demonstrate Runhouse’s simple but powerful compute APIs to run locally defined functions on a remote cluster launched through Runhouse, bridging the gap between local and remote. Additionally, save, reuse, and share any of your Runhouse Resources.

Please first make sure that you have successfully followed the Installation and Cluster Setup sections above prior to running this example.

import runhouse as rh

Running local functions on remote hardware

First let’s define a simple local function which returns the number of CPUs available.

def num_cpus(): import multiprocessing return f"Num cpus: {multiprocessing.cpu_count()}" num_cpus()
'Num cpus: 10'

Next, instantiate the cluster that we want to run this function on. This can be either an existing cluster where you pass in an IP address and SSH credentials, or a cluster associated with supported Cloud account (AWS, GCP, Azure, LambdaLabs), where it is automatically launched (and optionally terminated) for you.

# Using an existing, bring-your-own cluster cluster = rh.cluster( name="cpu-cluster", ips=['<ip of the cluster>'], ssh_creds={'ssh_user': '<user>', 'ssh_private_key':'<path_to_key>'}, ) # Using a Cloud provider cluster = rh.cluster( name="cpu-cluster", instance_type="CPU:8", provider="cheapest", # options: "AWS", "GCP", "Azure", "Lambda", or "cheapest" )

If using a cloud cluster, we can launch the cluster with .up() or .up_if_not().

Note that it may take a few minutes for the cluster to be launched through the Cloud provider and set up dependencies.

cluster.up_if_not()

Now that we have our function and remote cluster set up, we’re ready to see how to run this function on our cluster!

We wrap our local function in rh.function, and associate this new function with the cluster. Now, whenever we call this new function, just as we would call any other Python function, it runs on the cluster instead of local.

num_cpus_cluster = rh.function(name="num_cpus_cluster", fn=num_cpus).to(system=cluster, reqs=["./"])
INFO | 2023-08-29 03:03:52.826786 | Writing out function function to /Users/caroline/Documents/runhouse/runhouse/docs/notebooks/basics/num_cpus_fn.py. Please make sure the function does not rely on any local variables, including imports (which should be moved inside the function body).
/Users/caroline/Documents/runhouse/runhouse/runhouse/rns/function.py:106: UserWarning: reqs and setup_cmds arguments has been deprecated. Please use env instead.
  warnings.warn(
INFO | 2023-08-29 03:03:52.832445 | Setting up Function on cluster.
INFO | 2023-08-29 03:03:53.271019 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-08-29 03:03:53.546892 | Authentication (publickey) successful!
INFO | 2023-08-29 03:03:53.557504 | Checking server cpu-cluster
INFO | 2023-08-29 03:03:54.942843 | Server cpu-cluster is up.
INFO | 2023-08-29 03:03:54.948097 | Copying package from file:///Users/caroline/Documents/runhouse/runhouse to: cpu-cluster
INFO | 2023-08-29 03:03:56.480770 | Calling env_20230829_030349.install
base servlet: Calling method install on module env_20230829_030349
Installing package: Package: runhouse
Installing Package: runhouse with method reqs.
reqs path: runhouse/requirements.txt
pip installing requirements from runhouse/requirements.txt with: -r runhouse/requirements.txt
Running: /opt/conda/bin/python3.10 -m pip install -r runhouse/requirements.txt
INFO | 2023-08-29 03:03:58.230209 | Time to call env_20230829_030349.install: 1.75 seconds
INFO | 2023-08-29 03:03:58.462054 | Function setup complete.
num_cpus_cluster()
INFO | 2023-08-29 03:04:01.105011 | Calling num_cpus_cluster.call
base servlet: Calling method call on module num_cpus_cluster
INFO | 2023-08-29 03:04:01.384439 | Time to call num_cpus_cluster.call: 0.28 seconds
'Num cpus: 8'

Saving, Reusing, and Sharing

Runhouse supports saving down the metadata and configs for resources like clusters and functions, so that you can load them from a different environment, or share it with your collaborators.

num_cpus_cluster.save()
<runhouse.resources.function.Function at 0x104634ee0>
num_cpus_cluster.share( users=["<email_to_runhouse_account>"], access_type="write", )

Now, you, or whoever you shared it with, can reload this function from another dev environment (like a different Colab, local, or on a cluster), as long as you are logged in to your Runhouse account.

reloaded_function = rh.function(name="num_cpus_cluster") reloaded_function()
INFO | 2023-08-29 03:04:24.820884 | Checking server cpu-cluster
INFO | 2023-08-29 03:04:25.850301 | Server cpu-cluster is up.
INFO | 2023-08-29 03:04:25.852478 | Calling num_cpus_cluster.call
base servlet: Calling method call on module num_cpus_cluster
INFO | 2023-08-29 03:04:26.127098 | Time to call num_cpus_cluster.call: 0.27 seconds
'Num cpus: 8'

Terminate the Cluster

To terminate the cluster, you can run:

cluster.teardown()
 Terminating cpu-cluster

Summary

In this tutorial, we demonstrated how to use runhouse to create references to remote clusters, run local functions on the cluster, and save/share and reuse functions with a Runhouse account.

Runhouse also lets you:

  • Send and save data (folders, blobs, tables) between local, remote, and file storage

  • Send, save, and share dev environments

  • Reload and reuse saved resources (both compute and data) from different environments (with a Runhouse account)

  • … and much more!