A Cluster is a Runhouse primitive used for abstracting a particular hardware configuration. This can be either an on-demand cluster (requires valid cloud credentials or a local Kube config if launching on Kubernetes), or a BYO (bring-your-own) cluster (requires IP address and ssh creds).
A cluster is assigned a name, through which it can be accessed and reused later on.
Builds an instance of Cluster
.
name (str) – Name for the cluster, to re-use later on.
host (str or List[str], optional) – Hostname (e.g. domain or name in .ssh/config), IP address, or list of IP
addresses for the cluster (the first of which is the head node). (Default: None
).
ssh_creds (dict or str, optional) – SSH credentials, passed as dictionary or the name of an SSHSecret object.
Example: ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'}
(Default: None
).
server_port (bool, optional) – Port to use for the server. If not provided will use 80 for a
server_connection_type
of none
, 443 for tls
and 32300
for all other SSH connection types.
server_host (bool, optional) – Host from which the server listens for traffic (i.e. the –host argument
runhouse server start run on the cluster). Defaults to “0.0.0.0” unless connecting to the server with an SSH
connection, in which case localhost
is used. (Default: None
).
server_connection_type (ServerConnectionType or str, optional) – Type of connection to use for the Runhouse
API server. ssh
will use start with server via an SSH tunnel. tls
will start the server
with HTTPS on port 443 using TLS certs without an SSH tunnel. none
will start the server with HTTP
without an SSH tunnel. (Default: None
).
launcher (LauncherType or str, optional) – Method for launching the cluster. If set to local, will launch
locally via Sky. If set to den, launching will be handled by Runhouse. Currently only relevant for
ondemand clusters and kubernetes clusters. (Default: local
).
ssl_keyfile (str, optional) – Path to SSL key file to use for launching the API server with HTTPS.
(Default: None
).
ssl_certfile (str, optional) – Path to SSL certificate file to use for launching the API server with HTTPS.
(Default: None
).
domain (str, optional) – Domain name for the cluster. Relevant if enabling HTTPs on the cluster. (Default: None
).
den_auth (bool, optional) – Whether to use Den authorization on the server. If True
, will validate incoming
requests with a Runhouse token provided in the auth headers of the request with the format:
{"Authorization": "Bearer <token>"}
. (Default: None
).
image (Image, optional) – Default image containing setup steps to run during cluster setup. See Image
.
(Default: None
)
load_from_den (bool) – Whether to try loading the Cluster resource from Den. (Default: True
)
dryrun (bool) – Whether to create the Cluster if it doesn’t exist, or load a Cluster object as a dryrun.
(Default: False
)
The resulting cluster.
Union[Cluster, OnDemandCluster]
Example
>>> # using private key >>> gpu = rh.cluster(host='<hostname>', >>> ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'}, >>> name='rh-a10x').save()
>>> # using password >>> gpu = rh.cluster(host='<hostname>', >>> ssh_creds={'ssh_user': '...', 'password':'*****'}, >>> name='rh-a10x').save()
>>> # using the name of an SSHSecret object >>> gpu = rh.cluster(host='<hostname>', >>> ssh_creds="my_ssh_secret", >>> name='rh-a10x').save()
>>> # Load cluster from above >>> reloaded_cluster = rh.cluster(name="rh-a10x")
Builds an instance of OnDemandCluster
. Note that region, memory, disk_size, and open_ports
are all passed through to SkyPilot’s Resource constructor.
name (str) – Name for the cluster, to re-use later on.
instance_type (int, optional) – Type of cloud VM type to use for the cluster, e.g. “r5d.xlarge”. Optional, as may instead choose to specify resource requirements (e.g. memory, disk_size, num_cpus, accelerators).
num_nodes (int, optional) – Number of nodes to use for the cluster.
provider (str, optional) – Cloud provider to use for the cluster.
autostop_mins (int, optional) – Number of minutes to keep the cluster up after inactivity,
or -1
to keep cluster up indefinitely. (Default: 60
).
use_spot (bool, optional) – Whether or not to use spot instance.
region (str, optional) – The region to use for the cluster.
memory (int or str, optional) – Amount of memory to use for the cluster, e.g. “16” or “16+”.
disk_size (int or str, optional) – Amount of disk space to use for the cluster, e.g. “100” or “100+”.
num_cpus (int or str, optional) – Number of CPUs to use for the cluster, e.g. “4” or “4+”.
accelerators (int or str, optional) – Number of accelerators to use for the cluster, e.g. “A101” or “L4:8”.
open_ports (int or str or List[int], optional) – Ports to open in the cluster’s security group. Note that you are responsible for ensuring that the applications listening on these ports are secure.
sky_kwargs (dict, optional) – Additional keyword arguments to pass to the SkyPilot Resource or launch APIs. Should be a dict of the form {“resources”: {<resources_kwargs>}, “launch”: {<launch_kwargs>}}, where resources_kwargs and launch_kwargs will be passed to the SkyPilot Resources API (See SkyPilot docs) and launch API (See SkyPilot docs), respectively. Any arguments which duplicate those passed to the ondemand_cluster factory method will raise an error.
server_port (bool, optional) – Port to use for the server. If not provided will use 80 for a
server_connection_type
of none
, 443 for tls
and 32300
for all other SSH connection types.
server_host (bool, optional) – Host from which the server listens for traffic (i.e. the –host argument
runhouse server start run on the cluster). Defaults to “0.0.0.0” unless connecting to the server with an SSH
connection, in which case localhost
is used.
server_connection_type (ServerConnectionType or str, optional) – Type of connection to use for the Runhouse
API server. ssh
will use start with server via an SSH tunnel. tls
will start the server
with HTTPS on port 443 using TLS certs without an SSH tunnel. none
will start the server with HTTP
without an SSH tunnel.
launcher (LauncherType or str, optional) – Method for launching the cluster. If set to local, will launch
locally via Sky. If set to den, launching will be handled by Runhouse. (Default: local
).
ssl_keyfile (str, optional) – Path to SSL key file to use for launching the API server with HTTPS.
ssl_certfile (str, optional) – Path to SSL certificate file to use for launching the API server with HTTPS.
domain (str, optional) – Domain name for the cluster. Relevant if enabling HTTPs on the cluster.
den_auth (bool, optional) – Whether to use Den authorization on the server. If True
, will validate incoming
requests with a Runhouse token provided in the auth headers of the request with the format:
{"Authorization": "Bearer <token>"}
. (Default: None
).
image (Image, optional) – Default image containing setup steps to run during cluster setup. See Image
.
(Default: None
)
load_from_den (bool) – Whether to try loading the Cluster resource from Den. (Default: True
)
dryrun (bool) – Whether to create the Cluster if it doesn’t exist, or load a Cluster object as a dryrun.
(Default: False
)
The resulting cluster.
Example
>>> import runhouse as rh >>> # On-Demand SkyPilot Cluster (OnDemandCluster) >>> gpu = rh.ondemand_cluster(name='rh-4-a100s', >>> instance_type='A100:4', >>> provider='gcp', >>> autostop_mins=-1, >>> use_spot=True, >>> region='us-east-1', >>> ).save()
>>> # Load cluster from above >>> reloaded_cluster = rh.ondemand_cluster(name="rh-4-a100s")
- __init__(name: str | None = None, ips: List[str] = None, creds: Secret = None, default_env: Env = None, server_host: str = None, server_port: int = None, ssh_port: int = None, client_port: int = None, server_connection_type: str = None, ssl_keyfile: str = None, ssl_certfile: str = None, domain: str = None, ssh_properties: Dict = None, den_auth: bool = False, dryrun: bool = False, skip_creds: bool = False, image: Image | None = None, **kwargs)[source]
The Runhouse cluster, or system. This is where you can run Functions or access/transfer data between. You can BYO (bring-your-own) cluster by providing cluster IP and ssh_creds, or this can be an on-demand cluster that is spun up/down through SkyPilot, using your cloud credentials.
Note
To build a cluster, please use the factory method
cluster()
.
Call a method on a module that is in the cluster’s object store.
module_name (str) – Name of the module saved on system.
method_name (str) – Name of the method.
stream_logs (bool, optional) – Whether to stream logs from the method call. (Default: True
)
run_name (str, optional) – Name for the run. (Default: None
)
remote (bool, optional) – Return a remote object from the function, rather than the result proper.
(Default: False
)
run_async (bool, optional) – Run the method asynchronously and return an awaitable. (Default: False
)
save (bool, optional) – Whether or not to save the call. (Default: False
)
*args – Positional arguments to pass to the method.
**kwargs – Keyword arguments to pass to the method.
Example
>>> cluster.call("my_module", "my_method", arg1, arg2, kwarg1=kwarg1)
Clear the cluster’s object store.
Delete the given items from the cluster’s object store. To delete all items, use cluster.clear()
keys (str or List[str]) – key or list of keys to delete from the object store.
Delete configs for the cluster
Disconnect the RPC tunnel.
Example
>>> cluster.disconnect()
Download certificate from the cluster (Note: user must have access to the cluster)
Enable Den auth on the cluster.
flush (bool, optional) – Whether to flush the auth cache. (Default: True
)
Endpoint for the cluster’s Daemon server.
external (bool, optional) – If True
, will only return the external url, and will return None
otherwise (e.g. if a tunnel is required). If set to False
, will either return the external url
if it exists, or will set up the connection (based on connection_type) and return the internal url
(including the local connected port rather than the sever port). If cluster is not up, returns
None`. (Default: False
)
Load or construct resource from config.
config (Dict) – Resource config.
dryrun (bool, optional) – Whether to construct resource or load as dryrun (Default: False
)
Load existing Resource via its name.
name (str) – Name of the resource to load from name.
load_from_den (bool, optional) – Whether to try loading the module from Den. (Default: True
)
dryrun (bool, optional) – Whether to construct the object or load as dryrun. (Default: False
)
Get the result for a given key from the cluster’s object store.
key (str) – Key to get from the cluster’s object store.
default (Any, optional) – What to return if the key is not found. To raise an error, pass in
KeyError
. (Default: None)
remote (bool, optional) – Whether to get the remote object, rather than the object in full.
(Default: False
)
Install the given packages on the cluster.
reqs (List[Package or str]) – List of packages to install on cluster and env.
node (str, optional) – Cluster node to install the package on. If specified, will use ssh to install the
package. (Default: None
)
conda_env_name (str, optional) – Name of conda env to install the package in, if relevant. If left empty,
defaults to base environment. (Default: None
)
Example
>>> cluster.install_packages(reqs=["accelerate", "diffusers"]) >>> cluster.install_packages(reqs=["accelerate", "diffusers"], conda_env_name="my_conda_env")
Whether the RPC tunnel is up.
Example
>>> connected = cluster.is_connected()
Check if the cluster is up.
Example
>>> rh.cluster("rh-cpu").is_up()
List all keys in the cluster’s object store.
process (str, optional) – Process in which to list out the keys for.
Loads Runhouse clusters saved in Den and locally via Sky. If filters are provided, only clusters that are matching the filters are returned. If no filters are provided, all running clusters will be returned.
show_all (bool, optional) – Whether to list all clusters saved in Den. Maximum of 200 will be listed. (Default: False).
since (str, optional) – Clusters that were active in the specified time period will be returned. Value can be in seconds, minutes, hours or days.
status (str or ClusterStatus, optional) – Clusters with the provided status will be returned.
Options include: running
, terminated
, initializing
, unknown
.
force (bool, optional) – Whether to force a status update for all relevant clusters, or load the latest values. (Default: False).
Examples
>>> Cluster.list(since="75s") >>> Cluster.list(since="3m") >>> Cluster.list(since="2h", status="running") >>> Cluster.list(since="7d") >>> Cluster.list(show_all=True)
List all workers on the cluster.
Tunnel into and launch notebook from the cluster.
Example
>>> rh.cluster("test-cluster").notebook()
Whether this function is being called on the same cluster.
Context manager to temporarily pause autostop. Only for OnDemand clusters. There is no autostop for static clusters.
Put the given object on the cluster’s object store at the given key.
key (str) – Key to assign the object in the object store.
obj (Any) – Object to put in the object store
process (str, optional) – Process of the object store to put the object in. (Default: None
)
Put the given resource on the cluster’s object store. Returns the key (important if name is not set).
resource (Resource) – Key to assign the object in the object store.
state (Dict, optional) – Dict of resource attributes to override. (Default: False
)
dryrun (bool, optional) – Whether to put the resource in dryrun mode or not. (Default: False
)
process (str, optional) – Process of the object store to put the object in. (Default: None
)
Remove conda env from the cluster.
conda_env_name (str) – Name of conda env to remove from the cluster.
Example
>>> rh.ondemand_cluster("rh-cpu").remove_conda_env("my_conda_env")
Rename a key in the cluster’s object store.
old_key (str) – Original key to rename.
new_key (str) – Name to reassign the object.
Restart the RPC server.
resync_rh (bool) – Whether to Resync runhouse. If False
will not resync Runhouse onto the cluster.
If None
, will sync if Runhouse is not installed on the cluster or if locally it is installed
as editable. (Default: None
)
restart_ray (bool) – Whether to restart Ray. (Default: True
)
restart_proxy (bool) – Whether to restart Caddy on the cluster, if configured. (Default: False
)
Example
>>> rh.cluster("rh-cpu").restart_server()
Sync the contents of the source directory into the destination.
source (str) – The source path.
dest (str) – The target path.
up (bool) – The direction of the sync. If True
, will rsync from local to cluster. If False
will rsync from cluster to local.
node (Optional[str], optional) – Specific cluster node to rsync to. If not specified will use the address of the cluster’s head node.
contents (Optional[bool], optional) – Whether the contents of the source directory or the directory
itself should be copied to destination.
If True
the contents of the source directory are copied to the destination, and the source
directory itself is not created at the destination.
If False
the source directory along with its contents are copied ot the destination, creating
an additional directory layer at the destination. (Default: False
).
filter_options (Optional[str], optional) – The filter options for rsync.
stream_logs (Optional[bool], optional) – Whether to stream logs to the stdout/stderr. (Default: False
).
ignore_existing (Optional[bool], optional) – Whether the rsync should skip updating files that already exist
on the destination. (Default: False
).
Note
Ending source
with a slash will copy the contents of the directory into dest,
while omitting it will copy the directory itself (adding a directory layer).
Run bash commands on the cluster through the Runhouse server.
commands (str or List[str]) – Commands to run on the cluster.
node (int, str or None) – Node to run the command on. Node can an int referring to the node index,
string referring to the ips, or “all” to run on all nodes. If not specified, run the command
on the head node. (Default: None
)
process (str or None) – Process to run the command on. (Default: None
)
stream_logs (bool) – Whether to stream logs. (Default: True
)
require_outputs (bool) – Whether to return outputs in addition to status code. (Default: True
)
Run bash commands on the cluster over SSH.
commands (str or List[str]) – Commands to run on the cluster.
node (int, str or None) – Node to run the command on. Node can an int referring to the node index,
string referring to the ips, or “all” to run on all nodes. If not specified, run the command
on the head node. (Default: None
)
stream_logs (bool) – Whether to stream logs. (Default: True
)
require_outputs (bool) – Whether to return outputs in addition to status code. (Default: True
)
conda_env_name (str or None) – Name of conda env to run the command in, if applicable. (Defaut: None
)
Run a list of python commands on the cluster, or a specific cluster node if its IP is provided.
commands (List[str]) – List of commands to run.
process (str, optional) – Process to run the commands in. (Default: None
)
stream_logs (bool, optional) – Whether to stream logs. (Default: True
)
node (str, optional) – Node to run commands on. If not specified, runs on head node. (Default: None
)
Example
>>> cpu.run_python(['import numpy', 'print(numpy.__version__)']) >>> cpu.run_python(["print('hello')"]) >>> cpu.run_python(["print('hello')"], node="3.89.174.234")
Note
Running Python commands with nested quotes can be finicky. If using nested quotes, try to wrap the outer quote with double quotes (”) and the inner quotes with a single quote (‘).
Overrides the default resource save() method in order to also update the cluster config on the cluster itself.
name (str, optional) – Name to save the cluster as, if different from its existing name. (Default: None
)
overwrite (bool, optional) – Whether to overwrite the existing saved resource, if it exists.
(Default: True
)
folder (str, optional) – Folder to save the config in, if saving locally. If None and saving locally,
will be saved in the ~/.rh
directory. (Default: None
)
Address to use in the requests made to the cluster. If creating an SSH tunnel with the cluster, ths will be set to localhost, otherwise will use the cluster’s domain (if provided), or its public IP address.
Grant access to the resource for a list of users (or a single user). By default, the user will
receive an email notification of access (if they have a Runhouse account) or instructions on creating
an account to access the resource. If visibility
is set to public
, users will not be notified.
Note
You can only grant access to other users if you have write access to the resource.
users (Union[str, list], optional) – Single user or list of user emails and / or runhouse account usernames.
If none are provided and visibility
is set to public
, resource will be made publicly
available to all users. (Default: None
)
access_level (ResourceAccess
, optional) – Access level to provide for the resource.
(Default: read
).
visibility (ResourceVisibility
, optional) – Type of visibility to provide for the shared
resource. By default, the visibility is private. (Default: None
)
notify_users (bool, optional) – Whether to send an email notification to users who have been given access.
Note: This is relevant for resources which are not shareable
. (Default: True
)
headers (Dict, optional) – Request headers to provide for the request to Den. Contains the user’s auth token.
Example: {"Authorization": f"Bearer {token}"}
Users who already have a Runhouse account and have been granted access to the resource.
Users who do not have Runhouse accounts and received notifications via their emails.
Set of valid usernames and emails from users
parameter.
Tuple(Dict, Dict, Set)
Example
>>> # Write access to the resource for these specific users. >>> # Visibility will be set to private (users can search for and view resource in Den dashboard) >>> my_resource.share(users=["username1", "user2@gmail.com"], access_level='write')
>>> # Make resource public, with read access to the resource for all users >>> my_resource.share(visibility='public')
SSH into the cluster
Example
>>> rh.cluster("rh-cpu").ssh()
Restart the RPC server.
resync_rh (bool) – Whether to Resync runhouse. If False
will not resync Runhouse onto the cluster.
If None
, will sync if Runhouse is not installed on the cluster or if locally it is installed
as editable. (Default: None
)
restart_ray (bool) – Whether to restart Ray. (Default: True
)
restart_proxy (bool) – Whether to restart Caddy on the cluster, if configured. (Default: False
)
Example
>>> rh.cluster("rh-cpu").start_server()
Load the status of the Runhouse daemon running on a cluster.
send_to_den (bool, optional) – Whether to send and update the status in Den. Only applies to
clusters that are saved to Den. (Default: False
)
Stop the RPC server.
stop_ray (bool, optional) – Whether to stop Ray. (Default: True)
process (str, optional) – Specified process to stop the server on. (Default: None
)
cleanup_actors (bool, optional) – Whether to kill all Ray actors. (Default: True
)
Send secrets for the given providers.
providers (List[str] or None, optional) – List of providers to send secrets for.
If None, all providers configured in the environment will by sent. (Default: None
)
process (str, optional) – Process to sync secrets into, if setting env vars. (Default: None
)
Example
>>> cpu.sync_secrets(secrets=["aws", "lambda"])
Bring up the cluster if it is not up. No-op if cluster is already up. This only applies to on-demand clusters, and has no effect on self-managed clusters.
verbose (bool, optional) – Whether to stream logs from Den if the cluster is being launched. Only relevant if launching via Den. (Default: True)
Example
>>> rh.cluster("rh-cpu").up_if_not()
No additional setup is required. You will just need to have the IP address for the cluster and the path to SSH credentials ready to be used for the cluster initialization.
A OnDemandCluster is a cluster that uses SkyPilot functionality underneath to handle various cluster properties.
- __init__(name, instance_type: str = None, num_nodes: int = None, provider: str = None, default_env: Env = None, dryrun: bool = False, autostop_mins: int = None, use_spot: bool = False, memory: int | str = None, disk_size: int | str = None, num_cpus: int | str = None, accelerators: str = None, open_ports: int | str | List[int] = None, server_host: int = None, server_port: int = None, server_connection_type: str = None, launcher: str = None, ssl_keyfile: str = None, ssl_certfile: str = None, domain: str = None, den_auth: bool = False, region: str = None, sky_kwargs: Dict = None, **kwargs)[source]
On-demand SkyPilot Cluster.
Note
To build a cluster, please use the factory method
cluster()
.
Up the cluster async in another process, so it can be parallelized and logs can be captured sanely.
capture_output: If True, supress the output of the cluster creation process. If False, print the output normally. If a string, write the output to the file at that path.
Returns the acclerator type, or None if is a CPU.
Retrieve SSH key for the cluster.
path_to_file (Path) – Path of the private key associated with the cluster.
Example
>>> ssh_priv_key = rh.ondemand_cluster("rh-cpu").cluster_ssh_key("~/.ssh/id_rsa")
Endpoint for the cluster’s Daemon server.
external (bool, optional) – If True
, will only return the external url, and will return None
otherwise (e.g. if a tunnel is required). If set to False
, will either return the external url
if it exists, or will set up the connection (based on connection_type) and return the internal url
(including the local connected port rather than the sever port). If cluster is not up, returns
None`. (Default: False
)
Returns instance type of the cluster.
Whether the cluster is up.
Example
>>> rh.ondemand_cluster("rh-cpu").is_up()
Keep the cluster warm for given number of minutes after inactivity.
mins (int) – Amount of time (in min) to keep the cluster warm after inactivity. If set to -1, keep cluster warm indefinitely. (Default: -1)
Return the number of CPUs for a CPU cluster.
Context manager to temporarily pause autostop.
Example
>>> with rh.ondemand_cluster.pause_autostop(): >>> rh.ondemand_cluster.run(["python train.py"])
SSH into the cluster.
node – Node to SSH into. If no node is specified, will SSH onto the head node.
(Default: None
)
Example
>>> rh.ondemand_cluster("rh-cpu").ssh() >>> rh.ondemand_cluster("rh-cpu", node="3.89.174.234").ssh()
Teardown cluster.
verbose (bool, optional) – Whether to stream logs from Den when the cluster is being downed. Only relevant when tearing down via Den. (Default: True)
Example
>>> rh.ondemand_cluster("rh-cpu").teardown()
Teardown cluster and delete it from configs.
verbose (bool, optional) – Whether to stream logs from Den when the cluster is being downed. Only relevant when tearing down via Den. (Default: True)
Example
>>> rh.ondemand_cluster("rh-cpu").teardown_and_delete()
Up the cluster.
verbose (bool, optional) – Whether to stream logs from Den when the cluster is being launched. Only relevant if launching via Den. (Default: True)
force (bool, optional) – Whether to launch the cluster even if one with the same configs already exists. Only relevant if launching via Den. (Default: False)
Example
>>> rh.ondemand_cluster("rh-cpu").up()
On-Demand clusters use SkyPilot to automatically spin up and down clusters on the cloud. You will need to first set up cloud access on your local machine:
Run sky check
to see which cloud providers are enabled, and how to set up cloud credentials for each of the
providers.
$ sky check
For a more in depth tutorial on setting up individual cloud credentials, you can refer to SkyPilot setup docs.
If you would like to launch an on-demand cluster within a specific VPC, you can specify its name in your local
~/.sky/config.yaml
in the following format:
<cloud-provider>: vpc: <vpc-name>
See the SkyPilot docs for more details on configuring a VPC.
Runhouse provides a couple of options to manage the connection to the Runhouse API server running on a cluster.
The below options can be specified with the server_connection_type
parameter
when initializing a cluster. By default the Runhouse API server will
be started on the cluster on port 32300
.
ssh
: Connects to the cluster via an SSH tunnel, by default on port 32300
.
tls
: Connects to the cluster via HTTPS (by default on port 443
) using either a provided certificate, or
creating a new self-signed certificate just for this cluster. You must open the needed ports in the firewall, such
as via the open_ports argument in the OnDemandCluster, or manually in the compute itself or cloud console.
none
: Does not use any port forwarding or enforce any authentication. Connects to the cluster with HTTP by
default on port 80
. This is useful when connecting to a cluster within a VPC, or creating a tunnel manually
on the side with custom settings.
Note
The tls
connection type is the most secure and is recommended for production use if you are not running inside
of a VPC. However, be mindful that you must secure the cluster with authentication (see below) if you open it
to the public internet.
If desired, Runhouse provides out-of-the-box authentication via users’ Runhouse token (generated when
logging in) and set locally at: ~/.rh/config.yaml
). This is crucial if the cluster
has ports open to the public internet, as would usually be the case when using the tls
connection type. You may
also set up your own authentication manually inside of your own code, but you should likely still enable Runhouse
authentication to ensure that even your non-user-facing endpoints into the server are secured.
When initializing a cluster, you can set the den_auth
parameter to True
to enable token authentication. Calls to the cluster server can then be made using an auth header with the
format: {"Authorization": "Bearer <cluster-token>"}
. The Runhouse Python library adds this header to its calls
automatically, so your users do not need to worry about it after logging into Runhouse.
Note
Runhouse never uses your default Runhouse token for anything other than requests made to Runhouse Den. Your token will never be exposed or shared with anyone else.
Enabling TLS and Runhouse Den Dashboard Auth for the API server makes it incredibly fast and easy to stand up a microservice with standard token authentication, allowing you to easily share Runhouse resources with collaborators, teams, customers, etc.
Let’s illustrate this with a simple example:
import runhouse as rh def concat(a: str, b: str): return a + b # Launch a cluster with TLS and Den Auth enabled cpu = rh.ondemand_cluster(instance_type="m5.xlarge", provider="aws", name="rh-cluster", den_auth=True, open_ports=[443], server_connection_type="tls").up_if_not() # Remote function stub which lives on the cluster remote_func = rh.function(concat).to(cpu) # Save to Runhouse Den remote_func.save() # Give read access to the function to another user - this will allow them to call this service remotely # and view the function metadata in Runhouse Den remote_func.share("user1@gmail.com", access_level="read") # This other user (user1) can then call the function remotely from any python environment res = remote_func("run", "house") >> print(res) >> "runhouse"
We can also call the function via an HTTP request, making it easy for other users to call the function with a Runhouse cluster token (Note: this assumes the user has been granted access to the function or write access to the cluster):
$ curl -X GET "https://<DOMAIN>/concat/call?a=run&b=house" -H "Content-Type: application/json" -H "Authorization: Bearer <CLUSTER-TOKEN>"
Runhouse gives you the option of using Caddy as a reverse proxy for the Runhouse API server, which is a FastAPI app launched with Uvicorn. Using Caddy provides you with a safer and more conventional approach running the FastAPI app on a higher, non-privileged port (such as 32300, the default Runhouse port) and then use Caddy as a reverse proxy to forward requests from the HTTP port (default: 80) or the HTTPS port (default: 443).
Caddy also enables generating and auto-renewing self-signed certificates, making it easy to secure your cluster with HTTPS right out of the box.
Note
Caddy is enabled by default when you launch a cluster with the server_port
set to either 80 or 443.
Generating Certs
Runhouse offers two options for enabling TLS/SSL on a cluster with Caddy:
Using existing certs: provide the path to the cert and key files with the ssl_certfile
and
ssl_keyfile
arguments. These certs will be used by Caddy as specified in the Caddyfile on the cluster.
If no cert paths are provided and no domain is specified, Runhouse will issue
self-signed certificates to use for the cluster. These certs will not be verified by a CA.
Using Caddy to generate CA verified certs: Provide the domain
argument. Caddy will then obtain
certificates from Let’s Encrypt on-demand when a client connects for the first time.
Runhouse supports using custom domains for deploying your apps and services. You can provide the domain ahead of time
before launching the cluster by specifying the domain
argument:
cluster = rh.cluster(name="rh-serving-cpu", domain="<your domain>", instance_type="m5.xlarge", server_connection_type="tls", open_ports=[443]).up_if_not()
Note
After the cluster is launched, make sure to add the relevant A record to your domain’s DNS settings to point this domain to the cluster’s public IP address.
You’ll need to also ensure the relevant ports are open (ex: 443) in the security group settings of the cluster. Runhouse will also automatically set up a TLS certificate for the domain via Caddy.
If you have an existing cluster, you can also configure a domain by including the IP and domain when initializing the Runhouse cluster object:
cluster = rh.cluster(name="rh-serving-cpu", ips=["<public IP>"], domain="<your domain>", server_connection_type="tls", open_ports=[443]).up_if_not()
Now we can send modules or functions to our cluster and seamlessly create endpoints which we can then share and call from anywhere.
Let’s take a look at an example of how to deploy a simple LangChain RAG app.
Once the app has been created and sent to the cluster, we can call it via HTTP directly:
import requests resp = requests.get("https://<domain>/basic_rag_app/invoke?user_prompt=<prompt>") print(resp.json())
Or via cURL:
$ curl "https://<domain>/basic_rag_app/invoke?user_prompt=<prompt>"