Components of a Compound AI System

Discover key AI components for building intelligent applications, from language models for planning to classifiers, retrieval models, and response generators. Learn how models like GPT-o1, BERT, and CLIP enhance AI agents.

Paul Yang

ML @ 🏃‍♀️Runhouse🏠

Published February 26, 2025

AI applications and agents are compositions of one or more model inferences, alongside the software engineering and user design required to make the system useful. We wanted to briefly review the various components you might reach for and mention the underlying model names you may want to seek out. Many of these are language models based on Transformer architecture, operating at vastly different scales of model size and complexity—from BERT, released in 2018 by Google (and its modern variants since), to the latest frontier models that are available as either open- or closed-source.

Language Model for Planning: Agents frequently require a detailed plan to guide downstream execution.

Role: Large language models are capable of performing the complex reasoning necessary to generate these plans.
Benefits: By leveraging their contextual understanding, these models can produce actionable, step-by-step plans that align well with subsequent tasks.
Example: An agent planning a travel itinerary might generate a sequence of tasks—such as researching flights, comparing prices, and booking tickets—ensuring that each step is logically connected.
Models: GPT-4, GPT-o1/o3, Llama-405B, Claude Sonnet, DeepSeek R1, Gemini 2.0

Classifier for Determining User Intent or Tool Choice: At various decision points, the AI system must select the appropriate action, path, or tool from a predefined set of options.

Role: A classifier, built using either traditional ML techniques or language models (small or large), determines which option is most suitable based on the current context.
Benefits: This targeted decision-making enhances overall system accuracy by ensuring the correct tool is used at each step. This allows for customized input prompts and customized multi-step pathways to address each traversable branch of a user journey.
Example: An agent might choose between a translation API or a summarization tool depending on the specifics of the user's request and the context in which it appears.
Models: BERT-type classifiers, XGBoost, simple regular expressions

Embedding Model for Retrieval: Multimodal data and documents are represented and stored as vectors, allowing for efficient and meaningful retrieval.

Role: When queries or requests are received, the system calculates similarity scores between the query and stored vectors to identify the most relevant data.
Benefits: This approach augments response generation by providing contextually pertinent information, even from vast datasets. This has been marketed as “adding knowledge” or “adding long-term memory” to LLM applications.
Example: In response to a user legal query, the embedding model can retrieve relevant case text and laws that are relevant to the query, helping to tailor a more accurate response grounded in fact.
Models: BERT-type embedding models, Universal Sentence Encoder (USE), CLIP (for multimodal data), Dense Passage Retrieval (DPR)

Re-ranker: A specialized model that works in tandem with retrieval systems to refine the list of candidate documents.

Role: It evaluates and reorders the retrieved items to highlight the most relevant pieces of information.
Benefits: By filtering out less pertinent data, the re-ranker ensures that only the most useful context is provided for subsequent processing or response generation.
Example: After an initial retrieval of several related documents, the re-ranker might narrow the list down to the top three that most directly address the query. Or, you might take top N documents out of any relevant, until a context window limit is hit.
Models: Cross-encoder models

Large Language Model for Response Generation: This is what we most commonly associate with the generative AI boom. A large language model generates human-readable text, code, or API calls based on the processed input.

Role: It converts intermediate outputs into a final answer or iteratively refines its response during execution.
Benefits: The model’s robust generative capabilities ensure that the final output is coherent, contextually accurate, and user-friendly.
Example: In a chatbot application, the language model synthesizes a comprehensive answer by combining retrieved data with additional contextual information.
Models: Large and smaller LLMs from Sonnet/o3/r1 down to as small as tuned 7B parameter Llama models

Reviewers: Models specifically tuned to assess and verify the quality of generated content.

Role: They detect issues such as hallucinations, toxic language, inconsistencies with context, safety concerns, and deviations from brand standards.
Benefits: These reviewers serve as a quality control layer, ensuring that the final output is reliable, safe, and aligns with organizational values.
Example: Before delivering a response to the user, a reviewer model might flag or correct content that appears misleading or inappropriate, thereby maintaining high-quality interactions.
Models: BERT-type models fine-tuned for content moderation and factuality checking

Stay up to speed 🏃‍♀️📩

Subscribe to our newsletter to receive updates about upcoming Runhouse features and announcements.

Components of a Compound AI System

Stay up to speed 🏃‍♀️📩

Read More

How Does Cross Entropy Loss Work in PyTorch?

How to Run and Host Flux.1 Image Generation (on your own cloud)

StableDiffusion XL: How to Host Your Own Image Generation AI