This guide demonstrates how to use the same Docker image with your Runhouse cluster, for both:
Production: running functions and code that is pre-installed on the Docker image
Local development: making local edits to your repo, and propagating over those local changes to the cluster for experimentation
Afterwards, we provide a script that shows how to easily set up and toggle between these two settings, using the same cluster setup.
In this example, we are going to be using the DJLServing 0.27.0 with DeepSpeed 0.12.6 Container, which includes HuggingFace Tranformers (4.39.0), Diffusers (0.16.0), and Accelerate (0.28.0). We will use the container version of these packages to demonstrate the pre-packaged production workflow, as well as local editable versions to showcase the local experimentation use cases.
Because we are pulling the Docker image from AWS ECR, we need to provide the corresponding credentials in order to properly pull and setup the image on the cluster. This can be done through a Runhouse Docker secret, or by setting environment variables. Please refer to <Guide: Docker Cluster Setup> for more details.
import subprocess import runhouse as rh docker_ecr_creds = { "username": "AWS", "password": subprocess.run("aws ecr get-login-password --region us-west-1", shell=True, capture_output=True).stdout.strip().decode("utf-8"), "server": "763104351884.dkr.ecr.us-west-1.amazonaws.com", } docker_secret = rh.provider_secret("docker", values=docker_ecr_creds)
Next, construct a Runhouse image, passing in the docker image ID and secret. Feed this image into the OnDemand cluster factory, and up the cluster.
base_image = rh.Image("docker_image").from_docker( "djl-inference:0.27.0-deepspeed0.12.6-cu121", docker_secret=docker_secret ) cluster = rh.ondemand_cluster( name="diffusers_docker", image=base_image, instance_type="g5.8xlarge", provider="aws", ) cluster.up_if_not()
⠴ Preparing SkyPilot runtime (3/3 - runtime) [2mView logs at: ~/sky_logs/sky-2024-12-23-13-56-48-619803/provision.log[0m
✓ Cluster launched: diffusers_docker. View logs at: ~/sky_logs/sky-2024-12-23-13-56-48-619803/provision.log
INFO | 2024-12-23 14:03:39 | runhouse.resources.hardware.launcher_utils:391 | Starting Runhouse server on cluster
INFO | 2024-12-23 14:03:39 | runhouse.resources.hardware.cluster:1247 | Restarting Runhouse API server on diffusers_docker.
INFO: Started server process [2929]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:32300 (Press CTRL+C to quit)
<runhouse.resources.hardware.on_demand_cluster.OnDemandCluster at 0x127ea7730>
The function we’ll be using in our demo is is_transformers_available
from diffusers.utils
. We’ll first show how to use the base version
of this function, which was installed on the box through the cluster
setup (e.g. a production setting). Then, we’ll show how to propogate up
local changes and run them on the cluster, if your local version differs
from the one in the Docker container (e.g. different package version, or
locally edited).
from diffusers.utils import is_transformers_available
The core of the production workflow is that the Docker image already contains the exact packages and versions we want, probably published into the registry in CI/CD. We don’t want to perform any installs or code changes within the image throughout execution so we can preserve exact reproducibility.
NOTE: By default, Ray and Runhouse are installed on the ondemand cluster during setup time (generally attempting to match the versions you have locally), unless we detect that they’re already present. To make sure that no installs occur in production, please make sure that you have Runhouse and Ray installed in your docker image.
The function is the is_transformers_available
function imported
above. When creating the function to run remotely on the production
Runhouse env, we pass in the flag sync_local=False
to indicate that
we want to use the function on the cluster, without re-syncing over
anything.
prod_fn = rh.function(is_transformers_available).to(cluster, sync_local=False)
INFO | 2024-12-23 14:04:57 | runhouse.resources.hardware.ssh_tunnel:91 | Running forwarding command: ssh -T -L 32300:localhost:32300 -i ~/.ssh/sky-key -o Port=10022 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=ERROR -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ConnectTimeout=30s -o ForwardAgent=yes -o ProxyCommand='ssh -i ~/.ssh/sky-key -o Port=22 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=ERROR -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ConnectTimeout=30s -o ForwardAgent=yes -W %h:%p ubuntu@52.24.239.151' root@localhost
INFO | 2024-12-23 14:05:00 | runhouse.resources.module:511 | Sending module is_transformers_available of type <class 'runhouse.resources.functions.function.Function'> to diffusers_docker
Now, simply call the function, and it will detect the corresponding function on the cluster to run. In this case, it returns whether or not transformers is available on the cluster, which it is, as it was part of the Docker image.
prod_fn()
INFO | 2024-12-23 14:05:01 | runhouse.servers.http.http_client:439 | Calling is_transformers_available.call
INFO | 2024-12-23 14:05:06 | runhouse.servers.http.http_client:504 | Time to call is_transformers_available.call: 4.86 seconds
True
For even more specifics on any setup for running your function, you can also directly use cluster functionality (e.g. setting additional env vars, installing packages/running commands), or construct isolated processes (see Process API guide) with specific compute to run the function on.
Now for the local development and experimentation case. Let’s say we
have the HuggingFace diffusers
repository cloned and installed as a
local editable package, and are making changes to it that we want
reflected when we run it on the cluster. We also have a different
version of the transformers package installed.
Let’s continue using the is_transformers_available
function, except
this time we’ll change the function to return the version number of the
transformers package if it exists, instead of True
. This shows that
we have transformers==4.44.2
installed locally.
In my local diffusers/src/diffusers/utils/import_utils.py file:
def is_transformers_available(): try: import transformers return transformers.__version__ except ImportError: return False
from diffusers.utils import is_transformers_available is_transformers_available()
'4.44.2'
When Runhoue installs packages on the remote cluster, it will check if
you have a version of the package locally, as well as whether a version
of the package already exists on this cluster. If it already exists
remotely, by default the remote package will not be overriden, but you
can force the local version by passing in the paramteter
force_sync_local==True
to cluster.install_packages
.
cluster.install_packages(["transformers", "diffusers"], force_sync_local=True)
Now construct a Runhouse function normally and send it to the cluster.
Here, we can leave out the sync_local
flag, which defaults to True -
the local function will be synced onto the cluster.
dev_fn = rh.function(is_transformers_available).to(cluster)
INFO | 2024-12-23 14:11:05 | runhouse.resources.module:511 | Sending module is_transformers_available of type <class 'runhouse.resources.functions.function.Function'> to diffusers_docker
Now, when we call the function, it returns the version of the transformers library installed, rather than a True/False. It also correctly returns the same version as the locally installed version, showing that both local diffusers and transformers packages were properly synced and installed on the cluster.
dev_fn()
INFO | 2024-12-23 14:11:19 | runhouse.servers.http.http_client:439 | Calling is_transformers_available.call
INFO | 2024-12-23 14:11:21 | runhouse.servers.http.http_client:504 | Time to call is_transformers_available.call: 2.48 seconds
'4.44.2'
Here, we implement the above as a script to demonstrate the difference between dev and prod. The script can easily be adapted and shared between teammates developing and working with the same repos, with a flag or variable flip to differentiate between experimentation and production branches.
from diffusers.utils import is_transformers_available if __name__ == "__main__": cluster = rh.ondemand_cluster(...) cluster.up_if_not() if prod: remote_fn = rh.function(is_transformers_available).to(cluster, sync_local=False) else: cluster.install_packages(["transformers", "diffusers"], ) remote_fn = rh.function(is_transformers_available).to(cluster) remote_fn() cluster.teardown()