This issue is raised and addressed ad nauseam on HN, but here goes: It doesn't m...

int_19h · 2025-01-20T23:33:02 1737415982

It's not distillation from o1 for the reasons that you have cited, but it's also no secret that ChatGPT (and Claude) are used to generate a lot of synthetic data to train other models, so it's reasonable to take this as evidence for the same wrt DeepSeek.

Of course it's also silly to assume that just because they did it that way, they don't have the know-how to do it from scratch if need be. But why would you do it from scratch when there is a readily available shortcut? Their goal is to get the best bang for the buck right now, not appease nerds on HN.

orbital-decay · 2025-01-21T11:58:51 1737460731

> but it's also no secret that ChatGPT (and Claude) are used to generate a lot of synthetic data to train other models

Is it true? The main part of training any modern model is finetuning, and by sending prompts to your competitors en masse to generate your dataset you're essentially giving up your know-how. Anthropic themselves do it on early snapshots of their own models, I don't see a problem believing DeepSeek when they claim to have trained v3 on early R1's outputs.

luma · 2025-01-20T23:11:54 1737414714

So how is it then that none of the other models behave in this way? Why is it just Deepseek?

orbital-decay · 2025-01-21T12:02:41 1737460961

Because they're being trained to answer this particular question. In other contexts it wasn't prepared for, Sonnet v2 readily refers to "OpenAI policy" or "Reddit Anti-Evil Operations Team". That's just dataset contamination.