Emergent AI
Posts
LLMs as a commodity

LLMs as a commodity

Why we're there and why that's great

Ilan Man
August 27, 2025

What a quirk of irony that the most expensive and potentially disruptive human invention is just¹ a commodity?

You guessed it - LLMs, those job stealing (sort of), workforce disrupting (soon I’m sure), super intelligent (any day now), technical behemoths² are swappable, replaceable, and their utility is plateauing for common use cases.

In some sense, like water, the electrical grid, or agricultural production, being a commodity, while unglamorous, isn’t a bad thing. On the contrary, it’s the most important thing. But while LLMs are taking the paradox of value to new extremes, I didn’t expect it to happen so fast. We’re not yet 3 years into this journey, and are already seeing examples of this commoditization³.

I’m not going to address the most trending topic last week in AI - which is the impending AI bubble (or lack of bubble depending on what time of day you ask someone). I’m trying to avoid chasing headlines.⁴ One thing is for sure: bubble or not, we aren’t going back to the pre-ChatGPT days (our brains won’t allow it!).

GPT-5 wasn’t a thud

I mean, sure, it didn’t live up to expectations. But there’s nothing - nothing - in AI these days that will live up to expectations.

Okay, so OpenAI released weird charts, had a bunch of angry users who didn’t like getting their favorite models taken away, and it marginally beat other models⁵ on the benchmarks. Is that a thud? Maybe - but how much does that matter?

Source: https://firstpagesage.com/reports/top-generative-ai-chatbots/

Despite an awkward rollout, I see the most popular AI product, by a wide margin, make it even easier to navigate. They got rid of that ridiculous drop down with a dozen different models that was impossible to understand, and quite frustrating (Am I using the right model for this task? Do I need to starting experimenting now? Did I accidentally pick the expensive model?) and replaced it with one interface that will (in principle) figure out which model is the best to use, and then pick that one.

And yes, that routing feature won’t work for all use cases since its also AI. It’ll need to improve over time. And maybe our prompts need to get better in order to help the AI route better. I’m not as cynical as others who believe OpenAI is routing to cheaper models to save money. I mean, if a cheaper model can get me results 95% as good as an expensive model, then go for it. Anyway, cutting corners at the expense of a great user experience is not a winning strategy (despite the current trend to light money on fire).

A vast majority of ChatGPT users want fast, relevant, and useful answers to their mostly basic questions.⁶ Some people want cooler images. Others like to converse with the chatbot. Exactly 0% are routinely solving problems that these LLMs are benchmarked against.⁷

Most customers aren’t yearning for a smarter LLM - we’re at diminishing marginal returns for a majority of tasks. We want products that are easy to use and have quick feedback loops so they can learn. Products don’t even need to make us feel good for us to be hooked on them! GPT-5 gives (many of) us that frictionless, easy to navigate feeling we’re looking for out of a commodity (despite being short of AGI).

A personal example

AI coding assistants are a huge game changer for me. I’ve experimented with GitHub Co-pilot, Roo Code, Claude Code, and Cursor. Most serious developers I know use Claude Code. Many at Enterprise companies are forced to use Co-pilot because of the Microsoft lock in. I like Cursor, even though it’s probably not technically the best out there. Why? Simple: because I’m used to it and it's good enough. It’s a commodity. I’ve used a few of the others, and they’re all the same to me. VS Code extensions or forks, more or less, with a few marginally different features. I don’t want to experiment with all of them to see which fits my use case the best.

I much prefer something that is 90% as good as the top of the line coding assistant if it means I can ignore it and focus on my work. If Cursor starts to increase their pricing or stops adding new features to keep parity with the rest, then sure, I’ll look at a competitor.

But until then, get out of my way so I can do what I do, and don’t make me work hard to figure your product out. This is true of Cursor (for me) and GPT-5.

This is where we’re going.

And that’s the goal, isn’t it?

There are many idioms to express this idea - “owning the rails” or “picks and shovels”. Typically infrastructure related, but essentially, the profitable business move is to own the means of production.⁸

And these companies are doing just that. It’s just not very sexy. I mean, spending and making billions is, but releasing a product feature that isn’t splashy but makes your life a little better is…kinda boring. In a headline sense. But as I’ve said before, I’m okay with a little boring. We don’t need world changing upgrades every 3 months. Sometimes tweaking at the margins (and in the meantime getting stickier) is great for consumers, and ultimately the business.

So let’s focus less on the headlines and benchmarks and provocative (read: clickbait-y) statements, and get back to basic product design, UI/UX, and delivering what customers actually want. Which is easy to use products that get out of the way, and enable folks to do what they do.

¹ I’ve been trying hard to remove the word “just” from my vernacular, but it feels appropriate here.

² To be sure, they are computational and energy behemoths, but technically? If you believe in the Bitter Lesson’s thesis, the magic is not some elegant formula, or a mapping of complex systems. No, the magic is brute force search.

³ Did you know there’s a difference between commodification and commoditization, and its not simply a language / regional preference? TIL.

⁴ That and waterfalls.

⁵ I love the title of this article “The Best AI in August 2025”. Can’t wait for September 2025’s list!

⁶ Sorry, your questions asking for a travel itinerary for your family of 4 are not as challenging for the AI to get mostly right (as right as any human can) as you think they are. Prompt better.

⁷ Yeah, I understand benchmarks are sometimes useful (and provocatively named), but it’s often headline grabbing and a pissing contest amongst model developers to see who can score higher. This isn’t where the value lies!

⁸ Socialist undertones applied to the most capitalist thing ever notwithstanding…

Reply

or to participate.