You are viewing stable version. Click here to see docs for the latest stable version.

Cluster

A Cluster is a Runhouse primitive used for abstracting a particular hardware configuration. This can be either an on-demand cluster (requires valid cloud credentials or a local Kube config if launching on Kubernetes), or a BYO (bring-your-own) cluster (requires IP address and ssh creds).

A cluster is assigned a name, through which it can be accessed and reused later on.

Cluster Factory Methods

runhouse.cluster(name: str, host: str | List[str] = None, ssh_creds: Dict | str = None, server_port: int = None, server_host: str = None, server_connection_type: ServerConnectionType | str = None, ssl_keyfile: str = None, ssl_certfile: str = None, domain: str = None, den_auth: bool = None, default_env: Env | str = None, load_from_den: bool = True, dryrun: bool = False, **kwargs) → Cluster | OnDemandCluster[source]

Builds an instance of Cluster.

Parameters:

name (str) – Name for the cluster, to re-use later on.
host (str or List[str], optional) – Hostname (e.g. domain or name in .ssh/config), IP address, or list of IP addresses for the cluster (the first of which is the head node). (Default: None).
ssh_creds (dict or str, optional) – SSH credentials, passed as dictionary or the name of an SSHSecret object. Example: ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'} (Default: None).
server_port (bool, optional) – Port to use for the server. If not provided will use 80 for a server_connection_type of none, 443 for tls and 32300 for all other SSH connection types.
server_host (bool, optional) – Host from which the server listens for traffic (i.e. the –host argument runhouse start run on the cluster). Defaults to “0.0.0.0” unless connecting to the server with an SSH connection, in which case localhost is used. (Default: None).
server_connection_type (ServerConnectionType or str, optional) – Type of connection to use for the Runhouse API server. ssh will use start with server via an SSH tunnel. tls will start the server with HTTPS on port 443 using TLS certs without an SSH tunnel. none will start the server with HTTP without an SSH tunnel. (Default: None).
ssl_keyfile (str, optional) – Path to SSL key file to use for launching the API server with HTTPS. (Default: None).
ssl_certfile (str, optional) – Path to SSL certificate file to use for launching the API server with HTTPS. (Default: None).
domain (str, optional) – Domain name for the cluster. Relevant if enabling HTTPs on the cluster. (Default: None).
den_auth (bool, optional) – Whether to use Den authorization on the server. If True, will validate incoming requests with a Runhouse token provided in the auth headers of the request with the format: {"Authorization": "Bearer <token>"}. (Default: None).
default_env (Env or str, optional) – Environment that the Runhouse server is started on in the cluster. Used to specify an isolated environment (e.g. conda env) or any setup and requirements prior to starting the Runhouse server. (Default: None)
load_from_den (bool) – Whether to try loading the Cluster resource from Den. (Default: True)
dryrun (bool) – Whether to create the Cluster if it doesn’t exist, or load a Cluster object as a dryrun. (Default: False)

Returns:

The resulting cluster.

Return type:

Union[Cluster, OnDemandCluster]

Example

>>> # using private key
>>> gpu = rh.cluster(host='<hostname>',
>>>                  ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'},
>>>                  name='rh-a10x').save()

>>> # using password
>>> gpu = rh.cluster(host='<hostname>',
>>>                  ssh_creds={'ssh_user': '...', 'password':'*****'},
>>>                  name='rh-a10x').save()

>>> # using the name of an SSHSecret object
>>> gpu = rh.cluster(host='<hostname>',
>>>                  ssh_creds="my_ssh_secret",
>>>                  name='rh-a10x').save()

>>> # Load cluster from above
>>> reloaded_cluster = rh.cluster(name="rh-a10x")

runhouse.ondemand_cluster(name: str, instance_type: str | None = None, num_instances: int | None = None, provider: str | None = None, autostop_mins: int | None = None, use_spot: bool = False, image_id: str | None = None, region: str | None = None, memory: int | str | None = None, disk_size: int | str | None = None, open_ports: int | str | List[int] | None = None, sky_kwargs: Dict = None, server_port: int = None, server_host: int = None, server_connection_type: ServerConnectionType | str = None, ssl_keyfile: str = None, ssl_certfile: str = None, domain: str = None, den_auth: bool = None, default_env: Env | str = None, load_from_den: bool = True, dryrun: bool = False, **kwargs) → OnDemandCluster[source]

Builds an instance of OnDemandCluster. Note that image_id, region, memory, disk_size, and open_ports are all passed through to SkyPilot’s Resource constructor.

Parameters:

name (str) – Name for the cluster, to re-use later on.
instance_type (int, optional) – Type of cloud instance to use for the cluster. This could be a Runhouse built-in type, or your choice of instance type.
num_instances (int, optional) – Number of instances to use for the cluster.
provider (str, optional) – Cloud provider to use for the cluster.
autostop_mins (int, optional) – Number of minutes to keep the cluster up after inactivity, or -1 to keep cluster up indefinitely.
use_spot (bool, optional) – Whether or not to use spot instance.
image_id (str, optional) – Custom image ID for the cluster. If using a docker image, please use the following string format: “docker:<registry>/<image>:<tag>”. See user guide for more information on Docker cluster setup.
region (str, optional) – The region to use for the cluster.
memory (int or str, optional) – Amount of memory to use for the cluster, e.g. “16” or “16+”.
disk_size (int or str, optional) – Amount of disk space to use for the cluster, e.g. “100” or “100+”.
open_ports (int or str or List[int], optional) – Ports to open in the cluster’s security group. Note that you are responsible for ensuring that the applications listening on these ports are secure.
sky_kwargs (dict, optional) – Additional keyword arguments to pass to the SkyPilot Resource or launch APIs. Should be a dict of the form {“resources”: {<resources_kwargs>}, “launch”: {<launch_kwargs>}}, where resources_kwargs and launch_kwargs will be passed to the SkyPilot Resources API (See SkyPilot docs) and launch API (See SkyPilot docs), respectively. Any arguments which duplicate those passed to the ondemand_cluster factory method will raise an error.
server_port (bool, optional) – Port to use for the server. If not provided will use 80 for a server_connection_type of none, 443 for tls and 32300 for all other SSH connection types.
server_host (bool, optional) – Host from which the server listens for traffic (i.e. the –host argument runhouse start run on the cluster). Defaults to “0.0.0.0” unless connecting to the server with an SSH connection, in which case localhost is used.
server_connection_type (ServerConnectionType or str, optional) – Type of connection to use for the Runhouse API server. ssh will use start with server via an SSH tunnel. tls will start the server with HTTPS on port 443 using TLS certs without an SSH tunnel. none will start the server with HTTP without an SSH tunnel.
ssl_keyfile (str, optional) – Path to SSL key file to use for launching the API server with HTTPS.
ssl_certfile (str, optional) – Path to SSL certificate file to use for launching the API server with HTTPS.
domain (str, optional) – Domain name for the cluster. Relevant if enabling HTTPs on the cluster.
den_auth (bool, optional) – Whether to use Den authorization on the server. If True, will validate incoming requests with a Runhouse token provided in the auth headers of the request with the format: {"Authorization": "Bearer <token>"}. (Default: None).
default_env (Env or str, optional) – Environment that the Runhouse server is started on in the cluster. Used to specify an isolated environment (e.g. conda env) or any setup and requirements prior to starting the Runhouse server. (Default: None)
load_from_den (bool) – Whether to try loading the Cluster resource from Den. (Default: True)
dryrun (bool) – Whether to create the Cluster if it doesn’t exist, or load a Cluster object as a dryrun. (Default: False)

Returns:

The resulting cluster.

Return type:

OnDemandCluster

Example

>>> import runhouse as rh
>>> # On-Demand SkyPilot Cluster (OnDemandCluster)
>>> gpu = rh.ondemand_cluster(name='rh-4-a100s',
>>>                  instance_type='A100:4',
>>>                  provider='gcp',
>>>                  autostop_mins=-1,
>>>                  use_spot=True,
>>>                  image_id='my_ami_string',
>>>                  region='us-east-1',
>>>                  ).save()

>>> # Load cluster from above
>>> reloaded_cluster = rh.ondemand_cluster(name="rh-4-a100s")

Cluster Class

class runhouse.Cluster(name: str | None = None, ips: List[str] = None, creds: Secret = None, default_env: Env = None, server_host: str = None, server_port: int = None, ssh_port: int = None, client_port: int = None, server_connection_type: str = None, ssl_keyfile: str = None, ssl_certfile: str = None, domain: str = None, den_auth: bool = False, dryrun: bool = False, **kwargs)[source]

__init__(name: str | None = None, ips: List[str] = None, creds: Secret = None, default_env: Env = None, server_host: str = None, server_port: int = None, ssh_port: int = None, client_port: int = None, server_connection_type: str = None, ssl_keyfile: str = None, ssl_certfile: str = None, domain: str = None, den_auth: bool = False, dryrun: bool = False, **kwargs)[source]

The Runhouse cluster, or system. This is where you can run Functions or access/transfer data between. You can BYO (bring-your-own) cluster by providing cluster IP and ssh_creds, or this can be an on-demand cluster that is spun up/down through SkyPilot, using your cloud credentials.

Note

To build a cluster, please use the factory method cluster().

call(module_name: str, method_name: str, *args, stream_logs: bool = True, run_name: str | None = None, remote: bool = False, run_async: bool = False, save: bool = False, **kwargs)[source]

Call a method on a module that is in the cluster’s object store.

Parameters:

module_name (str) – Name of the module saved on system.
method_name (str) – Name of the method.
stream_logs (bool, optional) – Whether to stream logs from the method call. (Default: True)
run_name (str, optional) – Name for the run. (Default: None)
remote (bool, optional) – Return a remote object from the function, rather than the result proper. (Default: False)
run_async (bool, optional) – Run the method asynchronously and return an awaitable. (Default: False)
save (bool, optional) – Whether or not to save the call. (Default: False)
*args – Positional arguments to pass to the method.
**kwargs – Keyword arguments to pass to the method.

Example

>>> cluster.call("my_module", "my_method", arg1, arg2, kwarg1=kwarg1)

clear()[source]: Clear the cluster’s object store.

delete(keys: None | str | List[str])[source]

Delete the given items from the cluster’s object store. To delete all items, use cluster.clear()

Parameters:: keys (str or List[str]) – key or list of keys to delete from the object store.

delete_configs()[source]: Delete configs for the cluster

disconnect()[source]

Disconnect the RPC tunnel.

Example

>>> cluster.disconnect()

download_cert()[source]: Download certificate from the cluster (Note: user must have access to the cluster)

enable_den_auth(flush: bool = True)[source]

Enable Den auth on the cluster.

Parameters:: flush (bool, optional) – Whether to flush the auth cache. (Default: True)

endpoint(external: bool = False)[source]

Endpoint for the cluster’s Daemon server.

Parameters:: external (bool, optional) – If True, will only return the external url, and will return None otherwise (e.g. if a tunnel is required). If set to False, will either return the external url if it exists, or will set up the connection (based on connection_type) and return the internal url (including the local connected port rather than the sever port). If cluster is not up, returns None`. (Default: False)

classmethod from_config(config: Dict, dryrun: bool = False, _resolve_children: bool = True)[source]

Load or construct resource from config.

Parameters:

config (Dict) – Resource config.
dryrun (bool, optional) – Whether to construct resource or load as dryrun (Default: False)

get(key: str, default: Any | None = None, remote=False)[source]

Get the result for a given key from the cluster’s object store.

Parameters:

key (str) – Key to get from the cluster’s object store.
default (Any, optional) – What to return if the key is not found. To raise an error, pass in KeyError. (Default: None)
remote (bool, optional) – Whether to get the remote object, rather than the object in full. (Default: False)

install_packages(reqs: List[Package | str], env: Env | str = None)[source]

Install the given packages on the cluster.

Parameters:

reqs (List[Package or str]) – List of packages to install on cluster and env.
env (Env or str) – Environment to install package on. If left empty, defaults to base environment. (Default: None)

Example

>>> cluster.install_packages(reqs=["accelerate", "diffusers"])
>>> cluster.install_packages(reqs=["accelerate", "diffusers"], env="my_conda_env")

is_connected()[source]

Whether the RPC tunnel is up.

Example

>>> connected = cluster.is_connected()

is_up() → bool[source]

Check if the cluster is up.

Example

>>> rh.cluster("rh-cpu").is_up()

keys(env: str | None = None)[source]

List all keys in the cluster’s object store.

Parameters:: env (str, optional) – Env in which to list out the keys for.

notebook(persist: bool = False, sync_package_on_close: str | None = None, port_forward: int = 8888)[source]

Tunnel into and launch notebook from the cluster.

Example

>>> rh.cluster("test-cluster").notebook()

on_this_cluster()[source]: Whether this function is being called on the same cluster.

pause_autostop()[source]: Context manager to temporarily pause autostop. Only for OnDemand clusters. There is no autostop for static clusters.

put(key: str, obj: Any, env: str | None = None)[source]

Put the given object on the cluster’s object store at the given key.

Parameters:

key (str) – Key to assign the object in the object store.
obj (Any) – Object to put in the object store
env (str, optional) – Env of the object store to put the object in. (Default: None)

put_resource(resource: Resource, state: Dict = None, dryrun: bool = False, env: str | Env = None)[source]

Put the given resource on the cluster’s object store. Returns the key (important if name is not set).

Parameters:

resource (Resource) – Key to assign the object in the object store.
state (Dict, optional) – Dict of resource attributes to override. (Default: False)
dryrun (bool, optional) – Whether to put the resource in dryrun mode or not. (Default: False)
env (str, optional) – Env of the object store to put the object in. (Default: None)

remove_conda_env(env: str | CondaEnv)[source]

Remove conda env from the cluster.

Parameters:: env (str or Env) – Name of conda env to remove from the cluster, or Env resource representing the environment.

Example

>>> rh.ondemand_cluster("rh-cpu").remove_conda_env("my_conda_env")

rename(old_key: str, new_key: str)[source]

Rename a key in the cluster’s object store.

Parameters:

old_key (str) – Original key to rename.
new_key (str) – Name to reassign the object.

restart_server(_rh_install_url: str | None = None, resync_rh: bool | None = None, restart_ray: bool = True, restart_proxy: bool = False)[source]

Restart the RPC server.

Parameters:

resync_rh (bool) – Whether to resync runhouse. Specifying False will not sync Runhouse under any circumstance. If it is None, then it will sync if Runhouse is not installed on the cluster or if locally it is installed as editable. (Default: None)
restart_ray (bool) – Whether to restart Ray. (Default: True)
restart_proxy (bool) – Whether to restart Caddy on the cluster, if configured. (Default: False)

Example

>>> rh.cluster("rh-cpu").restart_server()

rsync(source: str, dest: str, up: bool, node: str | None = None, contents: bool = False, filter_options: str | None = None, stream_logs: bool = False)[source]

Sync the contents of the source directory into the destination.

Parameters:

source (str) – The source path.
dest (str) – The target path.
up (bool) – The direction of the sync. If True, will rsync from local to cluster. If False will rsync from cluster to local.
node (Optional[str], optional) – Specific cluster node to rsync to. If not specified will use the address of the cluster’s head node.
contents (Optional[bool], optional) – Whether the contents of the source directory or the directory itself should be copied to destination. If True the contents of the source directory are copied to the destination, and the source directory itself is not created at the destination. If False the source directory along with its contents are copied ot the destination, creating an additional directory layer at the destination. (Default: False).
filter_options (Optional[str], optional) – The filter options for rsync.
stream_logs (Optional[bool], optional) – Whether to stream logs to the stdout/stderr. (Default: False).

Note

Ending source with a slash will copy the contents of the directory into dest, while omitting it will copy the directory itself (adding a directory layer).

run(commands: str | List[str], env: Env | str = None, stream_logs: bool = True, require_outputs: bool = True, node: str | None = None, _ssh_mode: str = 'interactive') → List[source]

Run a list of shell commands on the cluster.

Parameters:

commands (str or List[str]) – Command or list of commands to run on the cluster.
env (Env or str, optional) – Env on the cluster to run the command in. If not provided, will be run in the default env. (Default: None)
stream_logs (bool, optional) – Whether to stream log output as the command runs. (Default: True)
require_outputs (bool, optional) – If True, returns a Tuple (returncode, stdout, stderr). If False, returns just the returncode. (Default: True)
node (str, optional) – Node to run the commands on. If not provided, runs on head node. (Default: None)

Example

>>> cpu.run(["pip install numpy"])
>>> cpu.run(["pip install numpy"], env="my_conda_env"])
>>> cpu.run(["python script.py"])
>>> cpu.run(["python script.py"], node="3.89.174.234")

run_python(commands: List[str], env: Env | str = None, stream_logs: bool = True, node: str = None)[source]

Run a list of python commands on the cluster, or a specific cluster node if its IP is provided.

Parameters:

commands (List[str]) – List of commands to run.
env (Env or str, optional) – Env to run the commands in. (Default: None)
stream_logs (bool, optional) – Whether to stream logs. (Default: True)
node (str, optional) – Node to run commands on. If not specified, runs on head node. (Default: None)

Example

>>> cpu.run_python(['import numpy', 'print(numpy.__version__)'])
>>> cpu.run_python(["print('hello')"])
>>> cpu.run_python(["print('hello')"], node="3.89.174.234")

Note

Running Python commands with nested quotes can be finicky. If using nested quotes, try to wrap the outer quote with double quotes (”) and the inner quotes with a single quote (‘).

save(name: str | None = None, overwrite: bool = True, folder: str | None = None)[source]

Overrides the default resource save() method in order to also update the cluster config on the cluster itself.

Parameters:

name (str, optional) – Name to save the cluster as, if different from its existing name. (Default: None)
overwrite (bool, optional) – Whether to overwrite the existing saved resource, if it exists. (Default: True)
folder (str, optional) – Folder to save the config in, if saving locally. If None and saving locally, will be saved in the ~/.rh directory. (Default: None)

property server_address: Address to use in the requests made to the cluster. If creating an SSH tunnel with the cluster, ths will be set to localhost, otherwise will use the cluster’s domain (if provided), or its public IP address.

Grant access to the resource for a list of users (or a single user). By default, the user will receive an email notification of access (if they have a Runhouse account) or instructions on creating an account to access the resource. If visibility is set to public, users will not be notified.

Note

You can only grant access to other users if you have write access to the resource.

Parameters:

users (Union[str, list], optional) – Single user or list of user emails and / or runhouse account usernames. If none are provided and visibility is set to public, resource will be made publicly available to all users. (Default: None)
access_level (ResourceAccess, optional) – Access level to provide for the resource. (Default: read).
visibility (ResourceVisibility, optional) – Type of visibility to provide for the shared resource. By default, the visibility is private. (Default: None)
notify_users (bool, optional) – Whether to send an email notification to users who have been given access. Note: This is relevant for resources which are not shareable. (Default: True)
headers (Dict, optional) – Request headers to provide for the request to RNS. Contains the user’s auth token. Example: {"Authorization": f"Bearer {token}"}

Returns:

added_users:: Users who already have a Runhouse account and have been granted access to the resource.
new_users:: Users who do not have Runhouse accounts and received notifications via their emails.
valid_users:: Set of valid usernames and emails from users parameter.

Return type:

Tuple(Dict, Dict, Set)

Example

>>> # Write access to the resource for these specific users.
>>> # Visibility will be set to private (users can search for and view resource in Den dashboard)
>>> my_resource.share(users=["username1", "user2@gmail.com"], access_level='write')

>>> # Make resource public, with read access to the resource for all users
>>> my_resource.share(visibility='public')

ssh()[source]

SSH into the cluster

Example

>>> rh.cluster("rh-cpu").ssh()

status(send_to_den: bool = False)[source]

Load the status of the Runhouse daemon running on a cluster.

Parameters:: send_to_den (bool, optional) – Whether to send and update the status in Den. Only applies to clusters that are saved to Den. (Default: False)

stop_server(stop_ray: bool = True, env: str | Env = None)[source]

Stop the RPC server.

Parameters:

stop_ray (bool, optional) – Whether to stop Ray. (Default: True)
env (str or Env, optional) – Specified environment to stop the server on. (Default: None)

sync_secrets(providers: List[str] | None = None, env: str | Env = None)[source]

Send secrets for the given providers.

Parameters:

providers (List[str] or None, optional) – List of providers to send secrets for. If None, all providers configured in the environment will by sent. (Default: None)
env (str, Env, optional) – Env to sync secrets into. (Default: None)

Example

>>> cpu.sync_secrets(secrets=["aws", "lambda"])

up_if_not()[source]

Bring up the cluster if it is not up. No-op if cluster is already up. This only applies to on-demand clusters, and has no effect on self-managed clusters.

Example

>>> rh.cluster("rh-cpu").up_if_not()

Cluster Hardware Setup

No additional setup is required. You will just need to have the IP address for the cluster and the path to SSH credentials ready to be used for the cluster initialization.

OnDemandCluster Class

A OnDemandCluster is a cluster that uses SkyPilot functionality underneath to handle various cluster properties.

class runhouse.OnDemandCluster(name, instance_type: str = None, num_instances: int = None, provider: str = None, default_env: Env = None, dryrun: bool = False, autostop_mins: int = None, use_spot: bool = False, image_id: str = None, memory: int | str = None, disk_size: int | str = None, open_ports: int | str | List[int] = None, server_host: int = None, server_port: int = None, server_connection_type: str = None, ssl_keyfile: str = None, ssl_certfile: str = None, domain: str = None, den_auth: bool = False, region: str = None, sky_kwargs: Dict = None, **kwargs)[source]

__init__(name, instance_type: str = None, num_instances: int = None, provider: str = None, default_env: Env = None, dryrun: bool = False, autostop_mins: int = None, use_spot: bool = False, image_id: str = None, memory: int | str = None, disk_size: int | str = None, open_ports: int | str | List[int] = None, server_host: int = None, server_port: int = None, server_connection_type: str = None, ssl_keyfile: str = None, ssl_certfile: str = None, domain: str = None, den_auth: bool = False, region: str = None, sky_kwargs: Dict = None, **kwargs)[source]

On-demand SkyPilot Cluster.

Note

To build a cluster, please use the factory method cluster().

async a_up(capture_output: bool | str = True)[source]

Up the cluster async in another process, so it can be parallelized and logs can be captured sanely.

capture_output: If True, supress the output of the cluster creation process. If False, print the output normally. If a string, write the output to the file at that path.

accelerators()[source]: Returns the acclerator type, or None if is a CPU.

static cluster_ssh_key(path_to_file: Path)[source]

Retrieve SSH key for the cluster.

Parameters:: path_to_file (Path) – Path of the private key associated with the cluster.

Example

>>> ssh_priv_key = rh.ondemand_cluster("rh-cpu").cluster_ssh_key("~/.ssh/id_rsa")

endpoint(external: bool = False)[source]

Endpoint for the cluster’s Daemon server.

Parameters:: external (bool, optional) – If True, will only return the external url, and will return None otherwise (e.g. if a tunnel is required). If set to False, will either return the external url if it exists, or will set up the connection (based on connection_type) and return the internal url (including the local connected port rather than the sever port). If cluster is not up, returns None`. (Default: False)

get_instance_type()[source]: Returns instance type of the cluster.

is_up() → bool[source]

Whether the cluster is up.

Example

>>> rh.ondemand_cluster("rh-cpu").is_up()

keep_warm(mins: int = -1)[source]

Keep the cluster warm for given number of minutes after inactivity.

Parameters:: mins (int) – Amount of time (in min) to keep the cluster warm after inactivity. If set to -1, keep cluster warm indefinitely. (Default: -1)

num_cpus()[source]: Return the number of CPUs for a CPU cluster.

pause_autostop()[source]

Context manager to temporarily pause autostop.

Example

>>> with rh.ondemand_cluster.pause_autostop():
>>>     rh.ondemand_cluster.run(["python train.py"])

ssh(node: str | None = None)[source]

SSH into the cluster.

Parameters:: node – Node to SSH into. If no node is specified, will SSH onto the head node. (Default: None)

Example

>>> rh.ondemand_cluster("rh-cpu").ssh()
>>> rh.ondemand_cluster("rh-cpu", node="3.89.174.234").ssh()

teardown()[source]

Teardown cluster.

Example

>>> rh.ondemand_cluster("rh-cpu").teardown()

teardown_and_delete()[source]

Teardown cluster and delete it from configs.

Example

>>> rh.ondemand_cluster("rh-cpu").teardown_and_delete()

up()[source]

Up the cluster.

Example

>>> rh.ondemand_cluster("rh-cpu").up()

OnDemandCluster Hardware Setup

On-Demand clusters use SkyPilot to automatically spin up and down clusters on the cloud. You will need to first set up cloud access on your local machine:

Run sky check to see which cloud providers are enabled, and how to set up cloud credentials for each of the providers.

$ sky check

For a more in depth tutorial on setting up individual cloud credentials, you can refer to SkyPilot setup docs.

Specifying a VPC

If you would like to launch an on-demand cluster within a specific VPC, you can specify its name in your local ~/.sky/config.yaml in the following format:

<cloud-provider>:
  vpc: <vpc-name>

See the SkyPilot docs for more details on configuring a VPC.

Cluster Authentication & Verification

Runhouse provides a couple of options to manage the connection to the Runhouse API server running on a cluster.

Server Connection

The below options can be specified with the server_connection_type parameter when initializing a cluster. By default the Runhouse API server will be started on the cluster on port 32300.

ssh: Connects to the cluster via an SSH tunnel, by default on port 32300.
tls: Connects to the cluster via HTTPS (by default on port 443) using either a provided certificate, or creating a new self-signed certificate just for this cluster. You must open the needed ports in the firewall, such as via the open_ports argument in the OnDemandCluster, or manually in the compute itself or cloud console.
none: Does not use any port forwarding or enforce any authentication. Connects to the cluster with HTTP by default on port 80. This is useful when connecting to a cluster within a VPC, or creating a tunnel manually on the side with custom settings.

Note

The tls connection type is the most secure and is recommended for production use if you are not running inside of a VPC. However, be mindful that you must secure the cluster with authentication (see below) if you open it to the public internet.

Server Authentication

If desired, Runhouse provides out-of-the-box authentication via users’ Runhouse token (generated when logging in) and set locally at: ~/.rh/config.yaml). This is crucial if the cluster has ports open to the public internet, as would usually be the case when using the tls connection type. You may also set up your own authentication manually inside of your own code, but you should likely still enable Runhouse authentication to ensure that even your non-user-facing endpoints into the server are secured.

When initializing a cluster, you can set the den_auth parameter to True to enable token authentication. Calls to the cluster server can then be made using an auth header with the format: {"Authorization": "Bearer <cluster-token>"}. The Runhouse Python library adds this header to its calls automatically, so your users do not need to worry about it after logging into Runhouse.

Note

Runhouse never uses your default Runhouse token for anything other than requests made to Runhouse Den. Your token will never be exposed or shared with anyone else.

TLS Certificates

Enabling TLS and Runhouse Den Dashboard Auth for the API server makes it incredibly fast and easy to stand up a microservice with standard token authentication, allowing you to easily share Runhouse resources with collaborators, teams, customers, etc.

Let’s illustrate this with a simple example:

import runhouse as rh

def concat(a: str, b: str):
    return a + b

# Launch a cluster with TLS and Den Auth enabled
cpu = rh.ondemand_cluster(instance_type="m5.xlarge",
                          provider="aws",
                          name="rh-cluster",
                          den_auth=True,
                          open_ports=[443],
                          server_connection_type="tls").up_if_not()

# Remote function stub which lives on the cluster
remote_func = rh.function(concat).to(cpu)

# Save to Runhouse Den
remote_func.save()

# Give read access to the function to another user - this will allow them to call this service remotely
# and view the function metadata in Runhouse Den
remote_func.share("user1@gmail.com", access_level="read")

# This other user (user1) can then call the function remotely from any python environment
res = remote_func("run", "house")
>> print(res)
>> "runhouse"

We can also call the function via an HTTP request, making it easy for other users to call the function with a Runhouse cluster token (Note: this assumes the user has been granted access to the function or write access to the cluster):

$ curl -X GET "https://<DOMAIN>/concat/call?a=run&b=house"
-H "Content-Type: application/json" -H "Authorization: Bearer <CLUSTER-TOKEN>"

Caddy

Runhouse gives you the option of using Caddy as a reverse proxy for the Runhouse API server, which is a FastAPI app launched with Uvicorn. Using Caddy provides you with a safer and more conventional approach running the FastAPI app on a higher, non-privileged port (such as 32300, the default Runhouse port) and then use Caddy as a reverse proxy to forward requests from the HTTP port (default: 80) or the HTTPS port (default: 443).

Caddy also enables generating and auto-renewing self-signed certificates, making it easy to secure your cluster with HTTPS right out of the box.

Note

Caddy is enabled by default when you launch a cluster with the server_port set to either 80 or 443.

Generating Certs

Runhouse offers two options for enabling TLS/SSL on a cluster with Caddy:

Using existing certs: provide the path to the cert and key files with the ssl_certfile and ssl_keyfile arguments. These certs will be used by Caddy as specified in the Caddyfile on the cluster. If no cert paths are provided and no domain is specified, Runhouse will issue self-signed certificates to use for the cluster. These certs will not be verified by a CA.
Using Caddy to generate CA verified certs: Provide the domain argument. Caddy will then obtain certificates from Let’s Encrypt on-demand when a client connects for the first time.

Using a Custom Domain

Runhouse supports using custom domains for deploying your apps and services. You can provide the domain ahead of time before launching the cluster by specifying the domain argument:

cluster = rh.cluster(name="rh-serving-cpu",
                     domain="<your domain>",
                     instance_type="m5.xlarge",
                     server_connection_type="tls",
                     open_ports=[443]).up_if_not()

Note

After the cluster is launched, make sure to add the relevant A record to your domain’s DNS settings to point this domain to the cluster’s public IP address.

You’ll need to also ensure the relevant ports are open (ex: 443) in the security group settings of the cluster. Runhouse will also automatically set up a TLS certificate for the domain via Caddy.

If you have an existing cluster, you can also configure a domain by including the IP and domain when initializing the Runhouse cluster object:

cluster = rh.cluster(name="rh-serving-cpu",
                     ips=["<public IP>"],
                     domain="<your domain>",
                     server_connection_type="tls",
                     open_ports=[443]).up_if_not()

Now we can send modules or functions to our cluster and seamlessly create endpoints which we can then share and call from anywhere.

Let’s take a look at an example of how to deploy a simple LangChain RAG app.

Once the app has been created and sent to the cluster, we can call it via HTTP directly:

import requests

resp = requests.get("https://<domain>/basic_rag_app/invoke?user_prompt=<prompt>")
print(resp.json())

Or via cURL:

$ curl "https://<domain>/basic_rag_app/invoke?user_prompt=<prompt>"

Previous
Function

Next
Env