Hacker News new | past | comments | ask | show | jobs | submit login

Consider that the chinese might be misrepresenting their costs. A newsletter was implying that they might do it to undermine the sanctions justifications.

Agree that the AI bubble should pop though and the earlier, the better.




Their model is open and they published paper describing it https://arxiv.org/pdf/2412.19437 The can't be far off, or it would be noticed.

Even if they are heavily government subsidized for energy and hardware, I don see how the cost of training in the US would be more than double.


They express their cost in terms of GPU hours, then convert that to USD based on market GPU rental rates, so it's not affected by subsidies. It's possible however they lied about GPU hours, but if that was the case an expert should be able to show they lied by working out how many flops are needed to train based on the amount of tokens they say they used vs the flops of the GPUs they say they used.


Total training FLOPs can be deduced from model architecture (which they can't hide since they released weights) and how many tokens they trained on. With total training FLOPs and GPU hours you can calculate MFU. And the MFU of their deepseek-v3 train is around 40%, which sounds right. Both Google and Meta reported higher MFU. So the GPU hours should be correct. The only thing they could have lied is on how many tokens they trained the model on. DeepSeek reported 14T which is also similar to what Meta did so nothing crazy here.

tl;dr all numbers check up and the winnings come from the model architecture innovations they made.


Doesnt that say they based it on llama?, sooooo not really a bottoms up training - since the cost of llama is 100% surely not part of their quote.


I did a quick search for "llama" and didn't find anywhere they outright state they just fine-tuned some llama weights.

Is it possible that they based their model architecture on the llama model architecture? Rather than just fine-tuned already training llama weights? In that case, they'd still have to do "bottoms up" training.


People on the internet can lie. Especially when such a lie could cause the nasdaq to dip multiple percent points.

Not saying they are lying, but there incentives.


Much easier to identify the incentives of the people who just lost a lot of money who were betting on the idea that it was their money that was going to make artificial intelligence intelligent.

Everyone’s already begun trying this recipe in-house. Either it works with much less compute, or it doesn’t.

For instance, HKUST just did an experiment where small weak base models trained with DeepSeek’s method beat stronger small base models being trained with much more costly RL methods. Already this seems like it is enough to upend the low end models niche market, things like haiku and 4o-mini.

Be really skeptical why the people who should be making tons of money by realizing actually it was all a mirage and that they can now get the real stuff for even cheaper, would spend so much effort shouting about this, in order to undercut their own profitability..


Huggingface is reproducing it live on their blog…


Let's wait for reproduction first





Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: