> Cost as in, cost to you? Or cost to serve? This. IIUC to serve an LLM is to pe...

jsnell · 2025-02-18T12:49:15 1739882955

> IIUC to serve an LLM is to perform an O(n^2) computation on the model weights for every single character of user input.

The computations are not O(n^2) in terms of model weights (parameters), but linear. If it were quadratic, the number would be ludicrously large. Like, "it'll take thousands of years to process a single token" large.

(The classic transformers are quadratic on the context length, but that's a much smaller number. And it seems pretty obvious from the increases in context lengths that this is no longer the case in frontier models.)

> These models are 40+GB so that means I need to provision about 40GB RAM per concurrent user

The parameters are static, not mutated during the query. That memory can be shared between the concurrent users. The non-shared per-query memory usage is vastly smaller.

> How much would I have to charge for this?

Empirically, as little as 0.00001 cents per token.

For context, the Bing search API costs 2.5 cents per query.

jcgrillo · 2025-02-18T13:33:35 1739885615

Ah got it, that's more sensible. So is anyone making money with these things yet?

simonw · 2025-02-18T13:42:40 1739886160

The efficiency gains over the past 18 months have been incredible. Turns out there was a lot of low hanging fruit to make these things faster, cheaper and more resource efficient. https://simonwillison.net/2024/Dec/31/llms-in-2024/#llm-pric...

jcgrillo · 2025-02-18T14:04:25 1739887465

Interesting. There's obviously been a precipitous drop in the sticker price, but has there really been a concomitant efficiency increase? It's hard to believe the sticker price these companies are charging has anything to do with reality given how massively they're subsidized (free Azure compute, billions upon billions in cash, etc). Is this efficiency trend real? Do you know of any data demonstrating it?

simonw · 2025-02-18T17:11:11 1739898671

I have personal anecdotal evidence that they're getting more efficient: I've had the same 64GB M2 laptop for three years now. Back in March 2023 it could just about run LLaMA 1, a rubbish model. Today I'm running Mistral Small 3 on the same hardware and it's giving me a March-2023-GPT-4-era experience and using just 12GB of RAM.

People who I trust in this space have consistently and credibly talked about these constant efficiency gains. I don't think this is a case of selling compute for less than it costs to run.

skydhash · 2025-02-18T15:23:44 1739892224

People are comparing the current rush to the investments made in the early days of the internet while (purposely?) forgetting how expensive access to it was back then. Not saying that AI companies should make a profit today, but I don't see or hear that AI usage is becoming essentials in any way or form.

jcgrillo · 2025-02-18T15:41:47 1739893307

Yeah that's the big problem. The Internet (e-commerce, specifically) is an obviously good idea. Technologies which facilitate it are profitable because they participate in an ecosystem which is self sustaining. Brick and mortar businesses have something to gain by investing in an online presence. As far as I can tell, there's nothing similar with AI. The speech to text technology in my phone that I'm using right now to write this post is cool but it's not a killer app in the same way that an online shopping cart is.