Can you provide a link to the comment? R1's technical report (https://github.com...

Can you provide a link to the comment?

R1's technical report (https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSee...) says the prompt used for training is "<think> reasoning process here </think> <answer> answer here </answer>. User: prompt. Assistant:" This prompt format strongly suggests that the text between <think> is made the "reasoning" and the text between <answer> is made the "answer" in the web app and API (https://api-docs.deepseek.com/guides/reasoning_model). I see no reason why deepseek should not do it this way, if not considering post-generation filtering.

Plus, if you read table 3 of the R1 technical report, which contains an example of R1's chain of thought, its style (going back to re-evaluating the problem) resembles what I actually got in the COT in the web app.