Hacker News new | past | comments | ask | show | jobs | submit login

Although Llama 4 is too big for mere mortals to run without many caveats, the economics of call a dedicated-hosting Llama 4 are more interesting than expected.

$0.11 per 1M tokens, a 10 million content window (not yet implemented in Groq), and faster inference due to fewer activated parameters allows for some specific applications that were not cost-feasible to be done with GPT-4o/Claude 3.7 Sonnet. That's all dependent on whether the quality of Llama 4 is as advertised, of course, particularly around that 10M context window.






It's possible that we'll see smaller Llama 4-based models in the future, though. Similar to Llama 3.2 1B, which was released later than other Llama 3.x models.

Yeah, I too am looking forward to their small text only models at 3B and 1B.

> Llama 4 is too big for mere mortals to run without many caveats

AMD MI300x has day zero support to run it using vLLM. Easy enough to rent them for decent pricing.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: