I didn’t say the model gonna run on one chip of course. 70B needs ~300 chips (on...

vidarh · 2025-01-30T14:07:59 1738246079

They deliver pre-built full racks.

The "reasons" are most likely because it's not cost-effective as what is effective at this point a tech demo, that first becomes cheap to run if you're actually going to use a decent portion of the capacity for a single model.

boroboro4 · 2025-01-30T14:47:43 1738248463

How many servers in one rack? Let's say 42. How many chips in one server? Let's say 8. It's 336 cards per rack - enough for fp8 70B model weights (and, maybe, kv cache if your requests aren't too long, but probably not really). You need 10 (!) racks to serve one (!) DeepSeek model weights. There is also massive amount complexity arises from operating so many nodes.

During short time when Groq hardware appeared on the market it was costing 20K per card. It's 60 mln (!) per 1 Deepseek model. You need absolutely crazy amount of load to justify those costs, and, most likely, you will need massive amount additional nodes to handle KV cache of those requests.

vidarh · 2025-01-31T10:18:56 1738318736

Yes, you need a crazy amount of load for them to make sense. But when you're seeing providers build out whole data centres at a cost of billions, there you have their market.

This is a market where several large Nvidia customers are designing their own chips (e.g. Meta, Amazon, Google) because they're at a scale where it makes sense to try.

Whether it's a market that lets Groq be successful remains to be seen.