Hacker News new | past | comments | ask | show | jobs | submit login

In the context in which you said it, it matters a lot.

>> The idea that they used o1's outputs for their distillation further shows that models like o1 are necessary.

> Hmm, I think the narrative of the rise of LLMs is that once the output of humans has been distilled by the model, the human isn't necessary.

If deepseek was produced through the distillation (term of art) of o1, then the cost of producing deepseek is strictly higher than the cost of producing o1, and can't be avoided.

Continuing this argument, if the premise is true then deepseek can't be significantly improved without first producing a very expensive hypothetical o1-next model from which to distill better knowledge.

That is the argument that is being made. Please avoid shallow dismissals.

Edit: just to be clear, I doubt that deepseek was produced via distillation (term of art) of o1, since that would require access to o1's weights. It may have used some of o1's outputs to fine tune the model, which still would mean that the cost of training deepseek is strictly higher than training o1.






just to be clear, I doubt that deepseek was produced via distillation

Yeah, your technical point is kind of ridiculous here that in all my uses of distillation (and in the comment I quoted), distillation is used in informal sense and there's no allegation that DeepSeek could have been in possession of OpenAI's model weights, which is what's needed for your "Distillation (term of Art)".


I’m not sure why folks don’t speculate China is able to obtain copies of OpenAI's weights.

Seems reasonable they would be investing heavily in plaing state assets within OpenAI so they can copy the models.


Because it feeds conspiracy theories and because there's no evidence for it? Also, let's talk DeepSeek in particular, not "China".

Looking back on the article, it is indeed using "distillation" as a special/"term of art" but not using it correctly. IE, it's not actually speculating that DeepSeek obtained OpenAI's weights and distilled them down but rather that it used OpenAI's answers/output as a starting point (which there is a different method/"term of art").




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: