What is Amazon SageMaker? What Are the Alternatives to Sagemaker?

AWS SageMaker is a comprehensive platform for ML. Who should use it and why? And why not use it? We cover in detail the benefits and the components of the platform, but also the cons.

Paul Yang

ML @ 🏃‍♀️Runhouse🏠

Published July 22, 2024

What is Amazon SageMaker? What Are the Pros and Cons?

SageMaker is the flagship ML development product of Amazon Web Services (AWS), and can be considered a “ML platform in a box.” It streamlines the development and deployment of new models through a collection of tools and services that are all necessary to stand up a full ML platform.

First, let’s walk briefly through a high level of SageMaker’s pros and cons, before diving deeper into the details of various platform components.

Why should I use SageMaker?

Simplicity

AWS is a fully managed, complete environment for the whole workflow. You can use tools within the AWS toolset for everything from notebooks to data prep to inference and model serving. When teams go from singleton ML projects to building out a ML platform, it’s often helpful to know that all the available pieces are in one place. The main competition for the “all-in-one” solutions are from other hyperscalers and perhaps Databricks, but the choice of platform is frequently made based on where other data already lives.

Integration with Existing Stack and Compliance

For teams who are already deeply integrated into the AWS ecosystem, it makes logical sense to continue building the ML platform within AWS. SageMaker comes with expected enterprise features like data encryption, network isolation, and access controls. Adding third party vendors or additional SaaS tooling to accomplish specific tasks might not make sense from a security perspective, if those vendors do not operate entirely within the firewall boundaries of the organization. Even if SaaS tools do offer solutions within an organization’s own cloud, management and setup from user management through software configuration often take a long time to smooth out.

Managed Infrastructure

SageMaker automatically provisions compute resources from a pool of ML infrastructure, and also supports many features like auto-scaling for model inference. It is also well-integrated with the rest of AWS services, so storage in S3 buckets for notebooks, for instance, is fairly automatic. Alongside that, SageMaker also supports a range of ready-to-go pieces of infrastructure, from pre-trained models as well as pre-configured containers for ML. These might be somewhat expensive and annoying to manage as a small ML Infra team.

Why should I not use SageMaker?

Compute Cost

SageMaker imposes compute costs onto teams in two main ways. First, SageMaker instances are marked up, and often significantly, compared to the equivalent underlying EC2 instance that it runs on. For teams at the early stages of ML, who favor flexibility and total freedom from infrastructure management, this might be an acceptable cost. But as teams develop and run more and heavier workloads, these start to add up. Additionally, SageMaker does not offer all available instance types in the choice of compute compared to the full EC2 or EKS catalog. This leads to “wrong-sizing” of boxes to execute tasks. In a more advanced setting, teams might even want to go across clusters and clouds to access compute elsewhere, especially if GPU compute might be easier to access or cheaper in other clouds. Being able to mix-and-match clouds within a single pipeline can make a huge difference to large cloud bills.

Complexity and Learning Curve

SageMaker is a set of tools that Amazon has built, and learning how to use each, and then how to use them together ends up being a somewhat significant lift. Data scientists who did not previously use the platform cannot be reasonably expected to know how to use it, the way they might know how to use Python generically. And obviously, given the complexity, once you enter the platform, it can be hard to depart again. Workflows are not rendered in normal Python code or other portable formats, but rather in a set of highly specific AWS managed services. Lock-in is good for vendors, not teams.

Flexibility

SageMaker makes things “easy,” but as is often the trade-off with software that has been made “easy,” it is also harder to debug, adjust, and customize. This means that you are bound to their opinionated views on what workflows should look like, what logging and tracking should be, etc. Rather than providing the infinite flexibility of code, or fine-grained control that managing your own infrastructure brings, you are obviously taking the benefits of all-in-one with the costs to flexibility as well.

Key SageMaker Components

Amazon SageMaker consists of several components, each serving a specific purpose within the ML workflow. Here’s a closer look at these components:

1. SageMaker Studio

SageMaker Studio is an integrated development environment (IDE) tailored for general data science work. It offers a web-based interface where users can write code, track experiments, visualize data, and manage models all in one place. The studio also attempts to be shareable across multiple team members, has pre-configured images to accelerate development, and has advanced user access controls.

2. SageMaker Notebooks

SageMaker Notebooks are Jupyter notebooks that can be used in addition to SageMaker Studio, and represents a much more traditional approach to Jupyter Notebooks but embedded directly into ML cloud infrastructure. Users who don’t like the added complexity of Studio can still rely on simple notebook interfaces.

3. SageMaker Training

SageMaker Training is a fully-managed service that enables users to train ML models at scale. It provides optimized compute, chosen from a range of instance types including distributed methods. It can automatically tune hyperparameters, and can train models based on built-in algorithms or custom algorithms.

4. SageMaker Processing

SageMaker Processing includes ML-specific data processing tasks, and generally allows users to preprocess, postprocess, and evaluate ML models with ease. It supports various data processing tasks, including data transformation, validation, and feature engineering.

5. SageMaker Canvas

SageMaker Canvas aims to be a no code solution, letting users train models and deploy them without having the high level of training that data science usually requires. Increasingly, generative AI work can be done in Canvas as well.

6. SageMaker Experiments

SageMaker Experiments helps users track, organize, and compare ML experiments. It automatically records metadata, including hyperparameters, input data, and model artifacts. The goal is to make it easier to reproduce and analyze experiments, and it competes with other experiment tracking open source tools.

7. SageMaker Debugger

SageMaker Debugger provides real-time insights into model training, allowing users to detect issues early in the process. It offers built-in rules and customizable alerts to monitor metrics and identify anomalies as well.

8. SageMaker Model Monitor

SageMaker Model Monitor continuously monitors the performance of deployed models in production. It detects data drift and other anomalies. It competes with many of the open source tools that are available to monitor models.

9. More

We did not cover all components of the SageMaker platform here. You can see more detailed information https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html

Conclusion

Amazon SageMaker is a comprehensive, fully-managed service that simplifies the machine learning platform for many companies. It contains everything you might need to get started, from managed notebooks to managed training to managed inference. However, as teams get more sophisticated, it might be beneficial to adapt best practice ML platforms. Indeed, most ML teams at Amazon are not building on SageMaker, but likely sitting over more finely managed compute. One of the main frameworks being used to power these new platforms at advanced tech is Ray, adopted recently by many tech innovators; Spotify, Pinterest, Instacart, OpenAI and Uber to name just a few.

Runhouse is built to deliver benefits without the costs of SageMaker. Using Runhouse as a foundational framework for your ML platform delivers all the benefits of an easy-to-use platform while building on the same best practice principles we’ve observed at top tier ML practices. We are:

Open source
Write normal Python code, no domain specific language and no vendor lock-in
Easy to standup, working as a universal runtime on any infrastructure and compute
Flexibly use compute across multiple regions, clusters, and even cloud providers.
Share and collaborate with functions and modules that are reusable and trackable from central management

Stay up to speed 🏃‍♀️📩

Subscribe to our newsletter to receive updates about upcoming Runhouse features and announcements.