main
version. Click here to see docs for the latest stable version.Runhouse integrates with SkyPilot to enable automatic setup of an existing Docker container when you launch your on-demand cluster. When you specify a Docker image for an on-demand cluster, the container is automatically built and set up remotely on the cluster. The Runhouse server will start directly inside the remote container.
NOTE: This guide details the setup and usage for on-demand clusters only. It is not yet supported for static clusters.
One can specify a Docker Image through the Runhouse Image class, which is
passed into the cluster factory. Call .from_docker(image_id)
on the image,
passing in the Docker container in the format <registry>/<image>:<tag>
.
base_image = rh.Image("base_image").from_docker("nvcr.io/nvidia/pytorch:23.10-py3") docker_cluster = rh.ondemand_cluster( name="pytorch_cluster", image=base_image, instance_type="CPU:2+", provider="aws", )
To use a Docker image hosted on a private registry, such as ECR, you
need to additionally provide the user
, password
, and registry server
values, as used in docker login -u <user> -p <password> <registry server>
.
These values are propagated to SkyPilot at launch time, which will be used for setting up the base container on the cluster.
There are two approaches to providing this information:
Creating a runhouse Secret as follows, and pass it to the Image along with the Docker image above.
values = { "username": <user>, "password": <password>, "server": <server>, } docker_secret = rh.provider_secret("docker", values=values)
base_image = rh.Image("base_image").from_docker( "pytorch-training:2.2.0-cpu-py310-ubuntu20.04-ec2", docker_secret=docker_secret )
Directly set your local environment variables, as expected by and extracted by Skypilot during launch time. In this case, you do not need to specify the secret during the runhouse Image construction.
SKYPILOT_DOCKER_USERNAME
: <user>
SKYPILOT_DOCKER_PASSWORD
: <password>
SKYPILOT_DOCKER_SERVER
: <registry server>
For instance, to use the PyTorch2.2 ECR Framework provided here, you can set your environment variables as so:
$ export SKYPILOT_DOCKER_USERNAME=AWS $ export SKYPILOT_DOCKER_PASSWORD=$(aws ecr get-login-password --region us-east-1) $ export SKYPILOT_DOCKER_SERVER=763104351884.dkr.ecr.us-east-1.amazonaws.com
base_image = rh.Image("base_image").from_docker("pytorch-training:2.2.0-cpu-py310-ubuntu20.04-ec2")
In either case, we can then construct the cluster with the image as follows:
ecr_cluster = rh.ondemand_cluster( name="ecr_pytorch_cluster", image_id=base_image, instance_type="CPU:2+", provider="aws", )
You can then launch the docker cluster with ecr_cluster.up()
. If for
any reason the docker pull fails on the cluster (for instance, due to
incorrect credentials or permission error), you must first teardown the
cluster with ecr_cluster.teardown()
or
sky stop ecr_pytorch_cluster
in CLI before re-launching the cluster
with new credentials in order for them to propagate through.
To SSH directly onto the container, where the Runhouse server is
started, you can use runhouse cluster ssh <cluster_name>
.
By default, the remote Docker container, which is set up through
Skypilot, will be named sky_container
, and the user will be
root
.