Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Deepseek is the state of the art right now in terms of performance and output. It's really fast. The way it "explains" how it's thinking is remarkable.


DeepSeek is great because: 1) you can run the model locally, 2) the research was openly shared, and 3) the reasoning tokens are open. It is not, in my experience, state of the art. In all of my side by side comparisons thus far in real world applications between DeepSeek V3 and R1 vs 4o and o1, the latter has always performed better. OpenAI's models are also more consistent, glitching out maybe one in 10,000, whereas DeepSeek's models will glitch out 1 in 20. OpenAI models also handle edge cases better and have a better overall grasp of user intentions. I've had DeepSeek's models consistently misinterpret prompts, or confuse data in the prompts with instructions. Those are both very important things that make DeepSeek useless for real world applications. At least without finetuning them, which then requires using those huge 600B parameter models locally.

So it is by no means state of the art. Gemini Flash 2.0 also performs better than DeepSeek V3 in all my comparisons thus far. But Gemini Flash 2.0 isn't robust and reliable either.

But as a piece of research, and a cool toy to play with, I think DeepSeek is great.


I watched it complete pretty complicated tasks like "write a snake game in Python" and "write Tetris in Python" successfully. And the way it did it, with showing all the internal steps, I've never seen before.

Watch here. https://www.youtube.com/watch?v=by9PUlqtJlM


> which then requires using those huge 600B parameter models locally.

Are you running the smaller models locally? Doesn't seems unfair to compare it against 4o and o1 behind OpenAI APIs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: