How to Run and Host Flux.1 Image Generation (on your own cloud)

Flux1 is a new generative AI model that lets you create high-quality images based on text input prompts. We will show you how to easily host the model on your infrastructure, using AWS in this example.

Paul Yang

ML @ 🏃‍♀️Runhouse🏠

Published September 20, 2024

Flux1 homepage picture a new era of creation

What is Flux1 and Why Should I Use It?

Flux1 is a newly released series of image-generation models by Black Forest Labs. They were made (in)famous by being the generative AI model used by Twitter/X to enable image generation on the social media site.

Flux1 has 3 tiers of quality - Pro, which is hosted-only, and Dev & Schnell (meaning "fast" in German) which have been open-sourced. All three models represent some of the highest-quality image generators available, especially given their model sizes.

The key to Flux1 is that Black Forest Labs has solved many of the issues plaguing earlier generations of image generation models. In reported usage compared to StableDiffusion, you see:

Much greater adherence to prompts
Ability to handle text in images
Hands! And other fine details; the first generation of image models notoriously struggled to draw hands.

Is it the best image model? Likely as of publication, it's generically the highest quality without fine-tuning. But the model space changes quickly, but Runhouse offers you the ability to quickly deploy and experiment with models within your own infrastructure.

Deployment Example

Setup and Dependencies

Access to compute - we will use AWS as an example here, but you can easily change the compute provider to GCP, Azure (or really any other cloud, Lambda Labs, existing VM etc.). You need to configure your AWS credentials first using `aws configure` outside of the Python.

For dependencies, you will only need to install the Runhouse package locally. All other packages will be installed on the remote environment.

$ pip install "runhouse[aws]" 
$ aws configure

Write and Run Python

Runhouse lets you write Python code, send it as a module to remote compute for execution.

We first define a FluxPipeline. This will take a few minutes on the first run as you must first download the model and load it, but results should generate very quickly after the first run.

A method to load the model
And a method to "predict" or generate results based on user input.

Then, we define the code that will run locally below.

Bring up a cluster and define requirements for the worker that will run
Send the previously defined module to the cluster with .to()
Call the predict function on a user prompt

The code that runs "locally" can be run from any setting, and you can call FluxPipeline.predict(prompt) as if it were a local function, except it is fully in the cloud.

import runhouse as rh

# First, we define a class that will hold the model and allow us to send prompts to it.
# We'll later wrap this with `rh.module`. This is a Runhouse class that allows you to
# run code in your class on a remote machine.
#
# Learn more at run.house/docs/tutorials/api-modules.
class FluxPipeline:
    def __init__(
        self,
        model_id: str = "black-forest-labs/FLUX.1-schnell",  # Schenll is smaller and faster while dev is more powerful but slower
    ):
        super().__init__()
        self.model_id = model_id
        self.pipeline = None

    def _load_pipeline(self):
        import torch
        from diffusers import FluxPipeline

        if not self.pipeline:
            self.pipeline = FluxPipeline.from_pretrained(
                self.model_id, torch_dtype=torch.bfloat16, use_safetensors=True
            )
            self.pipeline.enable_sequential_cpu_offload()  # Optimizes memory usage to allow the model to fit and inference on an A10 which has 24GB of memory

    def generate(self, input_prompt: str, **parameters):
        import torch

        torch.cuda.empty_cache()

        if not self.pipeline:
            self._load_pipeline()

        image = self.pipeline(
            input_prompt,
            guidance_scale=0.0,
            num_inference_steps=4,
            max_sequence_length=256,
            generator=torch.Generator("cpu").manual_seed(0),
        ).images[0]

        return image

# Now, we define the main function that will run locally when we run this script, and set up our Runhouse module on a remote cluster. First, we create a cluster with the desired instance type and provider. Our `instance_type` here is defined as `g5.8xlarge`, which is an AWS instance type. We can alternatively specify an accelerator type and count, such as `A10G:1`, and any instance type with those specifications will be used.
if __name__ == "__main__":

    cluster = rh.cluster(
        name="rh-g5",
        instance_type="g5.8xlarge",
        provider="aws",
    ).up_if_not()

    # Next, we define the environment for our module. This includes the required dependencies that need to be installed on the remote machine, as well as any secrets that need to be synced up from local to remote.
    env = rh.env(
        name="flux_inference",
        reqs=[
            "diffusers",
            "torch",
            "transformers[sentencepiece]",
            "accelerate",
        ],
    )

    # Finally, we define our module and run it on the remote cluster. We construct it normally and then call `to` to run it on the remote cluster. Alternatively, we could first check for an existing instance on the cluster by calling `cluster.get(name="flux")`. This would return the remote model after an initial run.
    RemoteFlux = rh.module(FluxPipeline).to(cluster, env=env, name="FluxPipeline")
    remote_flux = RemoteFlux(
        name="flux"
    )  # This has now been set up as a service on the remote cluster and can be used for inference.

    # Calling our remote function
    # We can call the `generate` method on the model class instance if it were running locally. This will run the function on the remote cluster and return the response to our local machine automatically. Further calls will also run on the remote machine, and maintain state that was updated between calls
    prompt = "A woman runs through a large, grassy field towards a house."
    response = remote_flux.generate(prompt)
    response.save("flux-schnell.png")
    response.show()

Stay up to speed 🏃‍♀️📩

Subscribe to our newsletter to receive updates about upcoming Runhouse features and announcements.

How to Run and Host Flux.1 Image Generation (on your own cloud)

What is Flux1 and Why Should I Use It?

Deployment Example

Setup and Dependencies

Write and Run Python

Stay up to speed 🏃‍♀️📩

Read More

Create a Visual Chatbot on AWS EC2 with LLaVA-1.5

StableDiffusion XL: How to Host Your Own Image Generation AI

How to Deploy Llama 3.1 To Your Own Infrastructure (AWS Example, Released July 2024)