The compute scheduling part of the paper is also vey good, the way they balanced...

ahartmetz · 2025-01-28T20:05:40 1738094740

When everyone kind of ignores performance because compute is cheap and speed will double anway in 18 months (note: hasn't been true for 15 years), the willingness to optimize is almost a secret weapon. The first 50% or so are usually not even difficult because there is so much low-hanging fruit, and in most environments there's a lot of helpful tooling to measure exactly which parts are slow.

sgt101 · 2025-01-28T21:20:06 1738099206

Compute has been more than doubling because people have been spending silly money on it. How long ago would a proposal for a $10m cluster for ML have been thought surreal by any funding agency? Certainly less than 10 years ago. Now people are talking of spending billions and billions.

Madness.

HarHarVeryFunny · 2025-01-28T21:36:20 1738100180

When people are talking about $100M-$1B frontier model training runs, then obviously efficiency matters!

Sure training cost will go down with time, but if you are only using 10% of the compute of your competition (TFA: DeepSeek vs LLaMa) then you could be saving 100's of millions per training run!

ahartmetz · 2025-01-29T01:37:55 1738114675

I was more stating the perception that compute is cheap than the fact that compute is cheap - often enough it isn't! But carelessness about performance happens, well, by default really.

steve_adams_86 · 2025-01-28T20:46:43 1738097203

At my org this is a crazy problem. Before I arrived, people would throw all kinds of compute at problems. They still do. When you've got AWS over there ready to gobble up whatever tasks you've got, and the org is willing to pay, things get really sloppy.

It's also a science-based organization like OpenAI. Very intelligent people, but they aren't programmers first.

ASalazarMX · 2025-01-29T18:00:58 1738173658

I think the AI megacorps plan was always SaaS. Their focus was never on self-hosting, so optimization was useless: their customers would pay for unoptimized services whether they wanted or not.

Making AI practical for self-hosting was the real disruption of DeepSeek.

fsndz · 2025-01-28T21:36:06 1738100166

The secret is to basically use RL to create a model that will generate synthetic data. Then you use the synthetic dataset to fine-tune a pretrained model. The secret is basically synthetic data imo: https://medium.com/thoughts-on-machine-learning/the-laymans-...

cyanydeez · 2025-01-28T21:24:23 1738099463

Keep in mind: America made them do this.