The Quickest AWS SageMaker Deployment in Existence
Deploy and Run Your Code on SageMaker in 10 Minutes with Runhouse
CTO @ đââď¸Runhouseđ
Engineer @ đââď¸Runhouseđ
Weâre excited to announce AWS SageMaker support in Runhouse, aimed to unlock SageMakerâs unique infrastructure advantages without the typical onboarding lift, and improve ergonomics for existing SageMaker users. As usual, the essence of our approach is granting you accessibility and debuggability without the onramp or restrictions of conforming your code to infra-specific APIs. Weâll dive into detailed use cases and code samples, but first, letâs discuss who this is for and why weâve made it such a high priority.
The examples referenced in this post are publicly available in this Github repo.
Why AWS SageMaker? Why now?
SageMaker has a complex history, and frankly wasnât a compute platform we expected to prioritize adding to Runhouse this early. But weâve found that some little-known value drivers in SageMaker make it a compelling option for lean ML teams, and saw an opportunity to significantly improve the experience for existing SageMaker users.
Rather than a single tool, SageMaker is an out of the box âMachine Learning Platformâ offered by AWS, reminiscent of the ML Platforms publicized by Meta, Uber, Spotify, and others circa 2018. Itâs often picked up as a default option for new ML teams bootstrapping their infra or enterprise teams aligning on a centralized platform. But in 2023 with a slew of competing options, itâs unclear to most whether they should be using SageMaker despite a wealth of guides and blog posts about it. Itâs a complex suite of products, and considering its reputation for having a ~6 month onramp (which is now optional, stay tuned!), youâll want to look before you leap. Weâve also found that many startups, even those with ML infrastructure experts, arenât aware of some killer features of SageMaker which could dramatically improve their stack.
SageMakerâs more well-known high-level competitive value drivers (i.e. not diving into each of its subcomponents) lean organizational:
- Centralization - Time-tested offerings for nearly every piece of the ML platform within AWS. This is pretty simple. If you have credits, discounts, strict vendor constraints, or just arenât interested in entertaining many options for each piece of the stack, hereâs everything in one place.
- Admin controls and defaults - Enterprise teams can finely control how expensive resources are used, with usage limits and auto-stop by default in most places. With SageMaker, teams are far less likely to leave a GPU up for months by accident, and you can terminate an accidentally long-running GPU notebook more confidently than a random instance in EC2.
However, there is also unique value from SageMaker which is infrastructural, and far less well known.
- Scalable semi-serverless orchestrated compute - SageMakerâs compute model can be seen almost like a big shared Kubernetes cluster, but you get it without the management overhead or separate cost of a managed solution like EKS or ECS (though you still pay for SageMaker itself). You benefit from tapping into a large pool of compute so parallelism is nearly always available, rather than relying on a scheduler to queue your jobs, which adds latency, or an autoscaler to provision new instances, which adds failures and management overhead.
Suppose you have a training service which typically handles 1-2 jobs at once, and all of a sudden receive 4 requests, SageMaker would launch them in parallel without issue, whereas launching them one by one in EC2 or a two-node Kubernetes cluster would not be fun. Like a container orchestrator, SageMaker can also launch jobs from Docker images rather than Machine images, which is far easier and cheaper to manage. - GPUs are pooled in AWS SageMaker separately from EC2, and anecdotally, weâve observed that theyâre more available in SageMaker. This makes sense: the compute model of SageMaker is a large shared pool with ephemeral compute being released back into the pool constantly, whereas EC2 VMs tend to be longer lived.
These benefits are attractive, but come with complexity. SageMaker takes the approach of offering the complete cast of highly-specialized characters youâd find in ML platforms: a notebook service, a model registry, an orchestrator, an inference service, and much more (see this AI Infrastructure Alliance Landscape for a complete picture). All this with relatively prescriptive APIs reminiscent of what youâd find offered by an internal ML Platform team. These give the confidence of a system stress-tested at scale, but also make the onramp a 6-9 month ordeal of translating code and navigating complex behavior within the systems themselves.
SageMaker Onboarding in 10 Minutes with rh.SageMakerCluster
Runhouse (rh.SageMakerCluster
) is an abstraction in front of SageMaker that allows you, like other other Runhouse compute abstractions, to dispatch arbitrary code or data to SageMaker compute through a simple and ergonomic API. This saves you the need to migrate or conform your existing code to the SageMaker APIs - a task that not only takes time, but also leads to code duplication and forking if you also use any other infrastructure. Itâs open-source code using the SageMaker APIs locally with your own API keys and SageMaker setup, and doesnât require special permissions, enablements, or external vendors. If youâre already a SageMaker user, you can use SageMakerCluster immediately. It runs on top of SageMaker Training, primarily because this is the most flexible form of compute in SageMaker, but you can run inference, training, preprocessing, HPO, or any other arbitrary Python on the SageMakerCluster. Just send over rh.functions
or rh.modules
like you would to a static rh.cluster
or on_demand_cluster
.
Simple SageMaker Inference Example
All the code below can be found in this Github repo containing some common use cases for SageMaker and examples implementing them with Runhouse. In this post, we'll look at a simple inference service, and explore more complex examples in subsequent posts.
Before you begin, run:
pip install runhouse[sagemaker]
and see our quick SageMaker Hardware Setup walkthrough to confirm your environment is set up properly.
Hereâs all the code you need to deploy a simple Stable Diffusion inference service on a GPU in AWS SageMaker with Runhouse:
import runhouse as rh from diffusers import StableDiffusionPipeline def sd_generate_image(prompt): model = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-base").to("cuda") return model(prompt).images[0] if __name__ == "__main__": sm_gpu_cluster = rh.sagemaker_cluster(name="rh-sagemaker-gpu", instance_type="ml.g5.4xlarge", profile="sagemaker").up_if_not().save() # Create a Stable Diffusion microservice running on a SageMaker GPU sd_generate = rh.function(sd_generate_image).to(sm_gpu_cluster) # Call the service with a prompt, which will run remotely on the GPU img = sd_generate("A hot dog made out of matcha.") img.show()
Explore further: Source code
This code launches our desired SageMaker GPU instance, creates a Runhouse function object which wraps our existing code, and sends that function to the GPU as a microservice. Now whenever we call the local r sd_generate
function, it makes an HTTP call to the microservice running on the GPU, passing the prompt text and receiving a PIL image. Letâs walk through it line by line:
sm_gpu_cluster = rh.sagemaker_cluster(name="rh-sagemaker-gpu", instance_type="ml.g5.4xlarge", profile="sagemaker").up_if_not().save()
The first thing we do is create a cluster object, which represents a new SageMaker instance based on the instance type and other specs provided. Runhouse provides you with the flexibility to configure the SageMaker compute according to your use case. In this case since we are standing up an inference service and would like to keep it running for an indefinite amount of time. We can use the default autostop_mins=-1
. For the full list of configuration options, see the cluster factory documentation.
We then launch the compute (if it is not already running), create a new SSM session between your local machine and the SageMaker instance, and then create an SSH tunnel on top of the SSM session (see our documentation for more info on configuring SSM). Once the connection is made, you can SSH directly onto the instance (as easy as: ssh rh-sagemaker-gpu
) and make requests to a lightweight HTTP server which Runhouse starts on the instance.
# Create a Stable Diffusion microservice running on a SageMaker GPU sd_generate = rh.function(sd_generate_image).to(sm_gpu_cluster)
We then create a Runhouse function (or microservice), which handles receiving the prompt, calling the model, and returning the output image. We call .to()
on the function to deploy it to the SageMakerCluster.
# Call the service with a prompt, which will run remotely on the GPU img = sd_generate("A hot dog made out of matcha.") img.show()
After sending our function to SageMaker, we get back a Python callable object. This behaves exactly as we would expect it to if we were calling it locally, accepting the same inputs, producing the same outputs, and streaming stdout and logs back to be printed locally.
Advantages of Using Runhouse with SageMaker
If youâre considering standing up your own Kubernetes or ECS cluster for ML or youâd like to access SageMakerâs GPU availability, you should take 10 minutes to try SageMaker with Runhouse.
Access to SageMakerâs unique infra without migrating your existing code
Runhouse allows you to onboard to SageMaker in a matter of minutes, not months. Youâll retain the ability to run on any other compute (now or as your stack evolves) by leaving your code infra-agnostic, and interact with the compute from notebooks, IDEs, research, pipeline DAG, or any Python interpreter. Plus, you can easily integrate and adopt the complex world of SageMaker progressively over time as needed. We donât hide the underlying SageMaker APIs if you want to reach deeper, such as using your own estimator.
Superuser SageMaker usage out of the box
We work directly with AWS to ensure that weâre delivering best-practice usage of the SageMaker APIs. If you already use SageMaker, Runhouse requires no additional permissions or setup. Any user with permission to run SageMaker Training can use Runhouse SageMakerCluster. We can support other SageMaker compute types too! Let us know if you need another kind.
Better debuggability
Normally itâs difficult to iterate and debug SageMaker code because you need to submit your jobs for execution and essentially wait to see logs on the other side. Runhouse allows you to SSH into your SageMaker box, send arbitrary files or CLI commands up to the cluster, and streams stdout and logs back to you in real time, so you can debug interactively. By default, it keeps your cluster warm for 30 minutes of inactivity, instead of shutting down immediately.
Making ML Infra Fast and Homey
At Runhouse, we believe that your code should command your infra, and not vice versa. If this resonates with you, please drop us a Github star.
To explore Runhouse SageMakerCluster further:
- Take a look at the SageMakerCluster API documentation
- Try running the tutorial code: đââď¸Runhouseđ & SageMaker
- Raise questions and feedback in our Discord or file a Github issue
If youâre interested in chatting about how to integrate rh.SageMakerCluster
into your existing stack, book time with us. We may be able to offer competitive programs through AWS for you to test it.
In subsequent posts weâll walk through more advanced use cases with SageMaker pipelines, including training and hyperparameter tuning, and how to set them up with Runhouse.
Stay up to speed đââď¸đŠ
Subscribe to our newsletter to receive updates about upcoming Runhouse features and announcements.