This guide demonstrates how to use the same Docker image with your Runhouse cluster, for both:
Production: running functions and code that is pre-installed on the Docker image
Local development: making local edits to your repo, and having local changes propagated over to the cluster for experimentation
Afterwards, we provide a script that shows how to easily set up and toggle between these two settings, using the same cluster setup.
In this example, we are going to be using the DJLServing 0.27.0 with DeepSpeed 0.12.6 Container, which includes HuggingFace Tranformers (4.39.0), Diffusers (0.16.0), and Accelerate (0.28.0). We will use both the container version of these packages, as well as local editable versions to showcase both production ready and local experimentation use cases for using the same Docker image.
Runhouse uses SkyPilot under the hood to set up the Docker image on the cluster. Because we are pulling the Docker image from AWS ECR, we first set some environment variables necessary to pull the Docker image.
For more specific details on getting your Docker image set up with Runhouse, please take a look at the Docker Setup Guide.
! export SKYPILOT_DOCKER_USERNAME=AWS ! export SKYPILOT_DOCKER_PASSWORD=$(aws ecr get-login-password --region us-west-1) ! export SKYPILOT_DOCKER_SERVER=763104351884.dkr.ecr.us-west-1.amazonaws.com
Once these variables are set, we can import runhouse and construct an
ondemand cluster, specifying the container image id as follows, and call
cluster.up_if_not()
to launch the cluster with the Docker image
loaded on it.
import runhouse as rh
INFO | 2024-08-01 02:18:48.921683 | Loaded Runhouse config from /Users/caroline/.rh/config.yaml
cluster = rh.ondemand_cluster( name="diffusers_docker", image_id="docker:djl-inference:0.27.0-deepspeed0.12.6-cu121", instance_type="g5.8xlarge", provider="aws", ) cluster.up_if_not()
The function we’ll be using in our demo is is_transformers_available
from diffusers.utils
. We’ll first show what using this function
directly on the box (e.g. a production setting) looks like. After, we’ll
show the case if we had local versions of the repositories, that we’d
modified, and wanted to test out our changes on the cluster.
from diffusers.utils import is_transformers_available
The core of the production workflow is that the Docker image already contains the exact packages and versions we want, probably published into the registry in CI/CD. We don’t want to perform any installs or code changes within the image throughout execution so we can preserve exact reproducibility.
NOTE: By default, Ray and Runhouse are installed on the ondemand cluster during setup time (generally attempting to match the versions you have locally), unless we detect that they’re already present. To make sure that no installs occur in production, please make sure that you have Runhouse and Ray installed in your docker image.
Here, we construct a Runhouse env containing anything you need for running your code, that doesn’t already live on the cluster. For instance, any environment variables or additional packages that you might need installed. Do NOT include the packages already installed on the container that you want pinned to the specific version, in this case diffusers and transformers.
Then send and create the env on the cluster by directly calling
env.to(cluster)
.
prod_env = rh.env(name="prod_env", env_vars={"HF_TOKEN": "****"}) prod_env.to(cluster)
INFO | 2024-08-01 02:19:13.168591 | Port 32300 is already in use. Trying next port.
INFO | 2024-08-01 02:19:13.172968 | Running forwarding command: ssh -T -L 32301:localhost:32300 -i ~/.ssh/sky-key -o Port=10022 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ConnectTimeout=30s -o ForwardAgent=yes -o ProxyCommand='ssh -T -L 32301:localhost:32300 -i ~/.ssh/sky-key -o Port=22 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ConnectTimeout=30s -o ForwardAgent=yes -W %h:%p ubuntu@3.142.171.243' root@localhost
INFO | 2024-08-01 02:19:16.685047 | Calling prod_env._set_env_vars
----------------
diffusers_docker
----------------
prod_env env: Calling method _set_env_vars on module prod_env
INFO | 2024-08-01 02:19:17.273890 | Time to call prod_env._set_env_vars: 0.59 seconds
INFO | 2024-08-01 02:19:17.350932 | Calling prod_env.install
prod_env env: Calling method install on module prod_env
INFO | 2024-08-01 02:19:17.929387 | Time to call prod_env.install: 0.58 seconds
<runhouse.resources.envs.env.Env at 0x133a6eb60>
The function is the is_transformers_available
function imported
above. When creating the function to run remotely on the production
Runhouse env, we pass in the name of the Runhouse env. By passing in
the env name, rather than the object, it simply signals that we want to
use the env that already lives on the cluster, without re-syncing over
anything.
prod_fn = rh.function(is_transformers_available).to(cluster, env=prod_env.name)
INFO | 2024-08-01 02:19:22.140840 | Sending module is_transformers_available of type <class 'runhouse.resources.functions.function.Function'> to diffusers_docker
Now, simply call the function, and it will detect the corresponding function on the cluster to run. In this case, it returns whether or not transformers is available on the cluster, which it is, as it was part of the Docker image.
prod_fn()
INFO | 2024-08-01 02:19:27.817880 | Calling is_transformers_available.call
prod_env env: Calling method call on module is_transformers_available
INFO | 2024-08-01 02:19:31.554237 | Time to call is_transformers_available.call: 3.74 seconds
True
Now for the local development and experimentation case. Let’s say we have the HuggingFace diffusers and transformers repositories cloned and installed as a local editable package, and are making changes to it that we want reflected when we run it on the cluster.
Let’s continue using the is_transformers_available
function, except
this time we’ll change the function to return the version number of the
transformers package if it exists, instead of True.
In my local diffusers/src/diffusers/utils/import_utils.py file:
def is_transformers_available: try: import transformers return transformers.__version__ except ImportError: return False
from diffusers.utils import is_transformers_available is_transformers_available()
'4.44.0.dev0'
In this case, because we want to use our local diffusers package, as well as our local transformers package and version, we include these as requirements inside our Runhouse env. There is no need to preemptively send over the env, as now we can directly pass in the env object when we define the function, to sync over the local changes.
dev_env = rh.env(name="dev_env", env_vars={"HF_TOKEN": "****"}, reqs=["diffusers", "transformers"])
Define a Runhouse function normally, passing in the function, and
sending it to the cluster. Here, we simply pass in the dev_env
object into the env argument. This will ensure that the folder that this
function is locally found in, along with any requirements in the env
requirements is synced over to the cluster properly. Even though the
container already contains its own version of these packages,
requirements that can be found locally, such as our local modified
diffusers and transformers (v 4.44.0.dev0) repositories will be synced
to the cluster.
dev_fn = rh.function(is_transformers_available).to(cluster, env=dev_env)
INFO | 2024-08-01 02:34:20.997084 | Copying package from file:///Users/caroline/Documents/diffusers to: diffusers_docker
INFO | 2024-08-01 02:34:24.924803 | Copying package from file:///Users/caroline/Documents/transformers to: diffusers_docker
INFO | 2024-08-01 02:34:31.626250 | Calling dev_env._set_env_vars
dev_env env: Calling method _set_env_vars on module dev_env
INFO | 2024-08-01 02:34:32.324740 | Time to call dev_env._set_env_vars: 0.7 seconds
INFO | 2024-08-01 02:34:32.444053 | Calling dev_env.install
dev_env env: Calling method install on module dev_env
Installing Package: diffusers with method pip.
Running via install_method pip: python3 -m pip install /root/diffusers
Installing Package: transformers with method pip.
Running via install_method pip: python3 -m pip install /root/transformers
INFO | 2024-08-01 02:34:56.084695 | Time to call dev_env.install: 23.64 seconds
INFO | 2024-08-01 02:34:56.239915 | Sending module is_transformers_available of type <class 'runhouse.resources.functions.function.Function'> to diffusers_docker
Now, we call the function
dev_fn()
INFO | 2024-08-01 02:35:01.303550 | Calling is_transformers_available.call
dev_env env: Calling method call on module is_transformers_available
INFO | 2024-08-01 02:35:02.946712 | Time to call is_transformers_available.call: 1.64 seconds
'4.44.0.dev0'
Here, we implement the above as a script that can be used to toggle between dev and prod. The script can easily be adapted and shared between teammates developing and working with the same repos, with a flag or variable flip to differentiate between experimentation and production branches.
from diffusers.utils import is_transformers_available if __name__ == "__main__": cluster = rh.ondemand_cluster(...) cluster.up_if_not() if prod: env = rh.env(name="prod_env_name", env_vars={...}, ...) env.to(cluster) remote_fn = rh.function(is_transformers_available).to(cluster, env=env.name) else: env = rh.env(name="dev_env_name", reqs=["diffusers", "trasnformers"], ...) remote_fn = rh.function(is_transformers_available).to(cluster, env=env) remote_fn()
To summarize the core differences between local experimentation and production workflow:
Local Development: Include local packages to sync in the reqs
field of the env
that the function is associated with.
Production Workflow: Do not include production packages that are
part of the Docker image in the reqs
field of the env
. Send the
env
to the cluster prior to defining the function, and then pass in
the env name rather than the env object for the function. Also, include
Runhouse and Ray on the image to pin those for production as well.