robot waving a magic wand at a butterfly

StableDiffusion XL: How to Host Your Own Image Generation AI

Discover the complete guide about how to host your own image generation AI with StableDiffusion. This comprehensive blog post walks you through the steps of setting up StableDiffusion XL on AWS. Learn about the hardware requirements, software dependencies, and configuration process needed to launch your own image AI.

Paul Yang

ML @ 🏃‍♀️Runhouse🏠

July 29, 2024

Setup
Code

Setup

First, we need to setup locally to make sure we can deploy the model to AWS. You will need:

AWS credentials with permission to launch a cluster (You can also use GCP or any other cloud of your choice)
A Hugging Face token, which lets you download the model

$ pip install "runhouse[aws]" Pillow
$ aws configure
$ sky check
$ export HF_TOKEN=<your huggingface token>

Code

The following code is short and simple, and will deploy StableDiffusion to an AWS cloud machine.

The g5.8xlarge we specify here is ~$2.4/hour to run, which makes it acceptable for research purposes.
We first define a model class, which will download the model, load it into memory, and do inference.
Then we send that model class to remote compute using `.get_or_to()` which makes the code callable locally but run on remote. The first time this script is run, it will take a few minutes as the model needs to download first.
Then we can call the `.generate()` and show the image on local, even though the execution was done on the remote cluster.

import base64
import os
from io import BytesIO

import runhouse as rh
from PIL import Image

# Define a class that will hold the model and allow us to send prompts to it.

class StableDiffusionXLPipeline(rh.Module):
    def __init__(
        self,
        model_id: str = "stabilityai/stable-diffusion-xl-base-1.0",
        model_dir: str = "sdxl",
    ):
        super().__init__()
        self.model_dir = model_dir
        self.model_id = model_id
        self.pipeline = None

    def _model_loaded_on_disk(self):
        return (
            self.model_dir
            and os.path.isdir(self.model_dir)
            and len(os.listdir(self.model_dir)) > 0
        )

    def _load_pipeline(self):
        import torch
        from diffusers import DiffusionPipeline
        from huggingface_hub import snapshot_download

        if not self._model_loaded_on_disk():
            # save compiled model to local directory
            # Downloads our compiled model from the HuggingFace Hub
            # and makes sure we exclude the symlink files and "hidden" files, like .DS_Store, .gitignore, etc.
            snapshot_download(
                self.model_id,
                local_dir=self.model_dir,
                local_dir_use_symlinks=False,
                allow_patterns=["[!.]*.*"],
            )

        # load local converted model into pipeline
        self.pipeline = DiffusionPipeline.from_pretrained(
            self.model_dir, device_ids=[0, 1], torch_dtype=torch.float16
        )
        self.pipeline.to("cuda")

    def generate(self, input_prompt: str, output_format: str = "JPEG", **parameters):
        # extract prompt from data
        if not self.pipeline:
            self._load_pipeline()

        generated_images = self.pipeline(input_prompt, **parameters)["images"]

        # postprocess convert image into base64 string
        encoded_images = []
        for image in generated_images:
            buffered = BytesIO()
            image.save(buffered, format=output_format)
            encoded_images.append(base64.b64encode(buffered.getvalue()).decode())

        # always return the first
        return encoded_images


def decode_base64_image(image_string):
    base64_image = base64.b64decode(image_string)
    buffer = BytesIO(base64_image)
    return Image.open(buffer)


## Now, we define the main function that will run locally when we run this script, and set up our Runhouse module on a remote cluster. 

#First, we create a cluster with the desired instance type and provider. Our `instance_type` here is `g5.8xlarge`, which is an AWS instance type costing $2.4/hr on demand as of 7/29/2024. 
if __name__ == "__main__":

    cluster = rh.cluster(
        name="rh-g5",
        instance_type="g5.8xlarge",
        provider="aws",
    ).up_if_not()

    # Next, we define the environment for our module. This includes the required dependencies that need to be installed on the remote machine, as well as any secrets that need to be synced up from local to remote. Passing `huggingface` to the `secrets` parameter will load the Hugging Face token we set up earlier.
    env = rh.env(
        name="sdxl_inference",
        reqs=[
            "diffusers==0.21.4",
            "huggingface_hub",
            "torch",
            "transformers==4.31.0",
            "accelerate==0.21.0",
        ],
        secrets=["huggingface"],  # Needed to download model
    )

    # Finally, we define our module and run it on the remote cluster. We construct it normally and then call `get_or_to` to run it on the remote cluster. Using `get_or_to` allows us to load the existing Module by the name `sdxl` if it was already put on the cluster. If we want to update the module each time we run this script, we can use `to` instead of `get_or_to`.
    model = StableDiffusionXLPipeline().get_or_to(cluster, env=env, name="sdxl")

    # We can call the `generate` method on the model class instance if it were running locally.    prompt = "A woman runs through a large, grassy field towards a house."
    response = model.generate(
        prompt,
        num_inference_steps=25,
        negative_prompt="disfigured, ugly, deformed",
    )

    for gen_img in response:
        img = decode_base64_image(gen_img)
        img.show()