In the paper DeepSeek just says they have ~800k responses that they used for the cold start data on R1, and are very vague about how they got it:
> To collect such data, we have explored several approaches: using few-shot prompting with a long CoT as an example, directly prompting models to generate detailed answers with reflection and verification, gathering DeepSeek-R1-Zero outputs in a readable format, and refining the results through post-processing by human annotators.
My surface-level reading of these two sections is that the 800k samples come from R1-Zero (i.e. "the above RL training") and V3:
>We curate reasoning prompts and generate reasoning trajectories by performing rejection sampling from the checkpoint from the above RL training. In the previous stage, we only included data that could be evaluated using rule-based rewards. However, in this stage, we expand the dataset by incorporating additional data, some of which use a generative reward model by feeding the ground-truth and model predictions into DeepSeek-V3 for judgment.
>For non-reasoning data, such as writing, factual QA, self-cognition, and translation, we adopt the DeepSeek-V3 pipeline and reuse portions of the SFT dataset of DeepSeek-V3. For certain non-reasoning tasks, we call DeepSeek-V3 to generate a potential chain-of-thought before answering the question by prompting.
The non-reasoning portion of the DeepSeek-V3 dataset is described as:
>For non-reasoning data, such as creative writing, role-play, and simple question answering, we utilize DeepSeek-V2.5 to generate responses and enlist human annotators to verify the accuracy and correctness of the data.
I think if we were to take them at their word on all this, it would imply there is no specific OpenAI data in their pipeline (other than perhaps their pretraining corpus containing some incidental ChatGPT outputs that are posted on the web). I guess it's unclear where they got the "reasoning prompts" and corresponding answers, so you could sneak in some OpenAI data there?
That's what I am gathering as well. Where is OpenAI going to have substantial proof to claim that their outputs were used ?
The reasoning prompts and answers for SFT from V3 you mean ? No idea. For that matter you have no idea where OpenAI got this data from either. If they open this can of worms, their can of worms will be opened as well.
Just feels like such an odd play lol. If they could organically generate leads/traffic that I'd be willing to get extorted over, then surely they would also have the means to start a marketing agency that I'd be willing to pay far more for?
I vaguely recall this being part of a tit-for-tat thing between China and the anti-Chinese. There have been movements to restrict Chinese access to FOSS, because forking FOSS lowers Chinese dependence on the West, along with (ironic) accusations that the "authoritarian" Chinese are limiting access to Western tech products. I thought there was some sort of legislative or judicial outcome that came out of it, but no luck with a quick google.
-----
U.S. restriction on Chinese use of open-source microchip tech would be hard to enforce - October 13, 2023
> U.S. lawmakers are pressuring the administration of President Joseph Biden to place restrictions on RISC-V to prevent China from benefiting from the technology as it attempts to develop its semiconductor industry.
China’s Use of Foreign Open-Source Software, and How to Counter It - April 2, 2024
> Democratic governments also need to reassess which products should not be made open-source because they’re at risk of being weaponized by malign actors.
Whatever the US did, Europe would do. Anybody in the US or Europe working on a FOSS project with Chinese contributors that they're friendly with? Has anything happened recently?
TianYancha is a corporate data aggregation website, it has nothing to do with FOSS. Your post is such a clumsy attempt to steer the conversation into Anti-Americanism/Westernism. Like really blatant lol.
> Okay, so those are the problems. What’s the solution?
> If you need to perform a case mapping on a string, you can use LCMapStringEx with LCMAP_LOWERCASE or LCMAP_UPPERCASE, possibly with other flags like LCMAP_LINGUISTIC_CASING. If you use the International Components for Unicode (ICU) library, you can use u_strToUpper and u_strToLower.
Devaluing the new currency by adding lesser metals will also devalue existing currency that is "pure" as you aren't able to trust the value of the currency anymore, so the value of the existing pool of money will drop.
Its at a smaller scale, but it can be seen with counterfeit currency today. Cash-heavy businesses have to absorb whatever amount of counterfeits they accept, so they are really valuing your dollar at $0.99 if they might have to throw it out.
There are ways to gauge the confidence of the LLM (token probabilities over the response, generating multiple outputs and checking consistency), but yeah that’s outside the LLM itself. You could feed the info back to the LLM as a status/message I suppose
The idea of hooking LLMs back up to themselves, i.e. giving them token prob information somehow or even giving them control over the settings they use to prompt themselves is AWESOME and I cannot believe that no one has seriously done this yet.
I've done it in some jupyter notebooks and the results are really neat, especially since LLMs can be made with a tiny bit of extra code to generate a context "timer" that they wait before they prompt themselves to respond, creating a proper conversational agent system (i.e. not the walkie talkie systems of today)
They claim the system has 90% accuracy, so they would have to actually kill about 10% more people than these numbers, to offset the 10% error rate. So between 610500 and 814000. The whole Gaza strip had about 2 million people before the current siege.
> To collect such data, we have explored several approaches: using few-shot prompting with a long CoT as an example, directly prompting models to generate detailed answers with reflection and verification, gathering DeepSeek-R1-Zero outputs in a readable format, and refining the results through post-processing by human annotators.