Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hopefully this is a big improvement from o1.

o1 has been very disappointing after spending sufficient time with Claude Sonnet 3.5. It's like it actively tries to gaslight me and thinks it knows more than I do. It's too stubborn and confidently goes off in tangents, suggesting big changes to parts of the code that aren't the issue. Claude tends to be way better at putting the pieces together in its not-quite-mental-model, so to speak.

I told o1 that a suggestion it gave me didn't work and it said "if it's still 'doesn't work' in your setup..." with "doesn't work" in quotes like it was doubting me... I've canceled my ChatGPT subscription and, when I really need to use it, just go with GPT-4o instead.



I've also noticed that with cGPT.

That said I often run into a sort of opposite issue with Claude. It's very good at making me feel like a genius. Sometimes I'll suggest trying a specific strategy or trying to define a concept on my own, and Claude enthusiastically agrees and takes us down a 2-3 hour rabbit hole that ends up being quite a waste of time for me to back track out of.

I'll then run a post-mortem through chatGPT and very often it points out the issue in my thinking very quickly.

That said I keep coming back to sonnet-3.5 for reasons I can't perfectly articulate. Perhaps because I like how it fluffs my ego lol. ChatGPT on the other hand feels a bit more brash. I do wonder if I should be using o1 as my daily driver.

I also don't have enough experience with o1 to determine if it would also take me down dead ends as well.


Really interesting point you make about Claude. I’ve experienced the same. What is interesting is that sometimes I’ll question it and say “would it not be better to do it this way” and all of a sudden Claude u-turns and says “yes great idea that’s actually a much better approach” which leaves me thinking; are you just stroking my ego, if it’s a better approach then why didn’t you suggest it?

However I have suggested worse approaches on purpose and sometime Claude does pick them up as less than optimal


It's a little sycophant.

But the difference is that it actually asks questions. And also that it actually rolls with what you ask it to do. Other models are stubborn and loopy.


I agree with this but o1 will also confidently take you into rabbit holes. You'll just feel worse about it lol and when you ask Claude for a post mortem, it too will find the answer you missed quickly

The truth is these models are very stochastic you have to try new chats whenever you even moderately suspect you're going awry


I keep coming back to try these models. o1, Sonnet, o3-mini.

None of them can produce correct Drizzle code to save their lives. It is just straight up not possible. It seems they don't even consider TypeScript errors... it is always calling methods that simply don't exist.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: