outline of a wizard and a scientist sitting at a table

Will LLMs Replace ML Training (No, here's why)

Despite the hype around LLMs, we contend that even very large and sophisticated LLMs cannot replace ML training done over your first-party data.

Paul Yang

ML @ 🏃‍♀️Runhouse🏠

October 28, 2024

We're giving LLMs all of our mindshare
Productivity vs. Product
Pre-trained models are like pre-compiled programs
You need experimentation at scale
Now is the right time to invest in a platform

We're giving LLMs all of our mindshare

When considering mindshare versus value creation, an outsized amount of energy is being devoted to generative AI and large language models (LLMs). As the hype cycle deepens, many organizations are focusing on LLM pilots, while optimistic VCs whisper that LLMs could eventually replace ML training altogether. However, the cost of primarily chasing LLM applications is delaying investments in the teams and infrastructure necessary to launch novel ML projects and train models on proprietary data.

But why invest the energy to train your own models if you can use LLMs for everything?

Fraud detection will be asking GPT-7 “here’s some information about a transaction, is it fraudulent?”
Self-driving cars will be asking Llama7-multimodal “answer quickly, is this a human child?”
Recommender systems will be asking Claude Rock n’ Roll “here’s several products, which one will my customer pick?”

The temptation to invest in LLM solutions is obvious: A pre-trained model for everything is easy, whereas traditional enterprise ML is hard. Model training requires a well-prepared data and compute infrastructure, along with a skilled engineering team to develop business-driving models. Even then, it might take five failed experiments before finding one extremely valuable model.

We don’t intend to write a contrarian take or argue that there’s no value creation by LLMs. However, we want to address head-on the expected value creation in the near term, and in the long term.

In the next two years, deep learning or even tree-based models trained on tabular data will drive value creation for enterprise ML and create your competitive edge, whereas LLM solutions will converge on commodity productivity solutions that must adopt as table stakes.
In the long term, LLMs will never reach a sophistication that can replace models created through recurring, custom training over first party data.

This article will argue why you must mobilize your data, launch experiments, and build a robust engineering platform for ML projects.

Productivity vs. Product

First, it is critical to distinguish between the remarkable ability of LLMs to enhance productivity and their capacity to solve novel use cases and drive product improvements. Enhancing productivity involves accelerating and amplifying human output, something tools like ChatGPT have been enabling for nearly three years. By contrast, improving a product refers to activities that directly impact the bottom line, such as increasing revenue (e.g., boosting conversion rates) or reducing costs (e.g., minimizing fraud). While there is linkage between these two, as increased productivity can facilitate more experiments to improve your product, they are not exactly the same.

Most of the value creation from LLMs today is concentrated in productivity solutions (and perhaps for NVIDIA shareholders). LLM use cases are primarily focused on tasks like information synthesis, drafting emails or text, and automatic code generation. While teams should certainly adopt LLM-powered tools to enhance productivity, there is no strategic advantage in doing so beyond keeping pace with industry standards. Purchasing GitHub Copilot for your developers is akin to using Microsoft Office—it's a baseline requirement to improve productivity, but it won't provide a competitive edge on its own.

15-40% Incremental gain from LLM ML use cases

McKinsey reports the incremental value driven by generative AI is primarily in productivity

On the other hand, custom ML training has a proven track record of creating competitive advantages. Take recommender systems, for example, which have generated billions of dollars in measurable value for tech companies over the past decade. For a retailer, implementing a recommender system can easily lead to a multi-percentage point increase in conversion rates, translating to tens of millions in additional annual revenue. These systems are well-researched, widely deployed, and have a clear return on investment. It’s not as obvious what kind of LLM project could deliver such a direct and significant improvement to the bottom line.

Nearly every interaction with Uber is powered by ML - Uber Engineering blog

This is a somewhat reductive framework, but the key distinction between pre-trained LLMs and enterprise ML is the mobilization of your own data. Productivity solutions can solve generic productivity problems across multiple organizations, while enterprise ML solves specific problems for your organization specifically.

Pre-trained models are like pre-compiled programs

We can compare LLMs to pre-compiled software programs: an LLM is a model built on someone else’s data, just as program binaries are compiled from someone else’s code. We often use pre-compiled programs as-is—no one rewrites their own database software from scratch when many people need the same basic queries to access and manipulate data. However, no one would seriously claim that all enterprise software development will eventually consist solely of writing SQL commands to Postgres. You still need to write your own code to achieve custom functionality and compile it for your specific requirements.

There is a broader strategic risk in allowing LLMs to overshadow first-party model development. The companies that lead in almost every domain are those that have leveraged their own data to gain an edge over competitors. This is a relatively uncontroversial assertion, supported by examples ranging from Facebook and Google’s dominance in advertising, Amazon’s advanced logistics network, and Netflix’s content relevance, to Capital One’s rise through superior risk pricing for subprime customers. Conversely, you cannot build a successful bank using only external public risk scores (like Experian FICO), and you pay a premium to Google for ads because they own the data.

It’s intuitive that generic solutions will underperform tailored ones. A model trained on global fraud data would be less effective for a rural Minnesota credit union than one trained on the specific characteristics of a local customer base. There is no practical way to use an LLM trained on the entirety of human knowledge to accurately predict fraud at a local gas station, or for any other specialized ML application.

To achieve meaningful results, you need to target specific outcomes tailored to your business—your customer conversions, your fraud events—using datasets that only you possess. Many enterprises have already spent the past five years building the data lakes required to power new, custom models. Now you just have to mobilize it for the improvement of your product. And you can observe how companies like Facebook, Microsoft, and Google are not reducing their ML engineering teams now that they have invested in AI research labs. They train (or fund) foundational LLMs as a relatively independent workstream from the enterprise ML work that powers their product.

You need experimentation at scale

If we accept that pre-trained models are insufficient, then how do we achieve custom development? As previously mentioned, enterprise ML development is challenging, which is why many vendors in the ML space have long tried to offer easy solutions. AutoML began with the promise that "anyone can train ML models with the click of a button," but it has since scaled back to more modest goals like automated hyperparameter tuning. There is no shortcut that allows you to ignore your infrastructure needs.

Instead, you should view ML platforms through the lens of enablement – providing the capability to launch new ML projects as seamlessly as your product or marketing teams can experiment with features or campaigns. Product managers regularly propose features that fail, and marketers test numerous ad and messaging variations. ML development follows a similar pattern of many failed experiments but with significant returns on the successful ones. This is precisely why committing to a six-month pilot for a large LLM project is so risky; you are compounding the costs of failure when the success rate is inherently low.

We have thought carefully about defining requirements (that we write in detail here) for an ML platform. But in short:

There should be broad developer access to the right hardware and data – researchers shouldn't be limited to working on data samples with constrained compute, as this often fails to replicate production outcomes over full datasets.
Development and deployment should follow a software development lifecycle (at software development speeds). The code should be iterative, debuggable, and easily testable on any environment that supports Python.

By the way, LLMs are just ML, and experimenting with LLMs is just one portion of your overall ML experimentation roadmap. Both embeddings and transformers architecture have had a disruptive impact to the traditional ML tasks, including radically improving the quality of classification models, named entity recognition, etc. Many of the best production deployments of custom LLMs rely on the same ML infrastructure that deep learning has always relied on.

Now is the right time to invest in a platform

We want to carve a nuanced path between dual extremes of casting doubt and accepting the hype cycle over LLMs. While transformers and LLMs represent a significant leap in machine learning, their impact should be seen as comparable to past innovations in the field. Overemphasizing this single sub-field at the expense of broader ML capabilities can be limiting. Without investing in scalable ML development infrastructure, teams will struggle to harness the full potential of their data.

The path forward lies in building a robust ML platform that empowers experimentation and integrates both custom ML approaches and LLMs. The real winners will be those who effectively blend pre-trained models with bespoke solutions, creating unique competitive advantages.

Remember: Your data is your moat, rapid experimentation is your engine. Don’t let the LLM hype overshadow these essential components. There are many ways to build a powerful ML platform, and Runhouse was designed to address this very challenge; we’d love to chat so reach out to us at hello@run.house.

Stay up to speed 🏃‍♀️📩

Subscribe to our newsletter to receive updates about upcoming Runhouse features and announcements.

Will LLMs Replace ML Training (No, here's why)

Contents

We're giving LLMs all of our mindshare

Productivity vs. Product

Pre-trained models are like pre-compiled programs

You need experimentation at scale

Now is the right time to invest in a platform

Stay up to speed 🏃‍♀️📩

Read More

AI Platforms: Past, Present, and Future

The Lean Orchestration Manifesto: Narrowing Orchestrator Scope Creates Better ML Development

How to Deploy Llama 3.1 To Your Own Infrastructure (AWS Example, Released July 2024)