Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My experience with cursor and sonnet is that it is relatively good at first tries, but completely misses the plot during corrections.

"My attempt at solving the problem contains a test that fails? No problem, let me mock the function I'm testing, so that, rather than actually run, it returns the expected value!"

It keeps doing that kind of shenanigans, applying modifications that solve the newly appearing problem while screwing the original attempt's goal.

I usually get much better results from regular chatgpt copying and pasting, the trouble being that it is a major pain to handle the context window manually by pasting relevant info and reminding what I think is being forgotten.



Claude makes a lot of crappy change suggestions, but when you ask "is that a good suggestion?" it's pretty good at judging when it isn't. So that's become standard operating procedure for me.

It's difficult to avoid Claude's strong bias for being agreeable. It needs more HAL 9000.


I'm always asking Claude to propose a variety of suggestions for the problem at hand and their trade-offs, then evaluating them for the top three proposals and why. Then I'll pick one of them and further vet the idea


>It's difficult to avoid Claude's strong bias for being agreeable. It needs more HAL 9000.

Absolutely, I find this a challenge as well. Every thought that crosses my mind is a great idea according to it. That's the opposite attitude to what I want from an engineer's copilot! Particularly from one who also advices junior devs.


> when you ask "is that a good suggestion?" it's pretty good at judging when it isn't

Basically a poor man's COT.


Yes it’s usually worth it to try to write a really good first prompt


More than once I've found myself going down this 'little maze of twisty passages, all alike'. At some point I stop, collect up the chain of prompts in the conversation, and curate them into a net new prompt that should be a bit better. Usually I make better progress - at least for a while.


This becomes second nature after a while. I've developed an intuition about when a model loses the plot and when to start a new thread. I have a base prompt I keep for the current project I'm working on, and then I ask the model to summarize what we've done in the thread and combine them to start anew.

I can't wait until this is a solved problem because it does slow me down.


Yes when new models come out it feels like breaking up.


Why is it so hard to share/find prompts or distill my own damn prompts? There must be good solutions for this —


What do you find difficult about distilling your own prompts?

After any back and forth session I have reasonably good results asking something like "Given this workflow, how could I have prompted this better from the start to get the same results?"


Analysis of past chats in bulk.


Don’t outsource the only thing left for our brains to do themselves :/


For my advanced use case involving Python and knowledge of finance, Sonnet fared poorly. Contrary to what I am reading here, my favorite approach has been to use o1 in agent mode. It’s an absolute delight to work with. It is like I’m working with a capable peer, someone at my level.

Sadly there are some hard limits on o1 with Cursor and I cannot use it anymore. I do pay for their $20/month subscription.


> o1 in agent mode

How? It specifically tells me this is unsupported: "Agent composer is currently only supported using Anthropic models or GPT-4o, please reselect the model and try again."


I think you’re right - I must have used it in regular mode, then got GPT-4o to fill in the gaps. It can fully automate a lot of menial work, such as refactors and writing tests. Though I’ll add, I had a roughly 50% success with GPT-4o bug fixing in agent mode, which is pretty great in my experience. When it did work, it felt glorious - 100% hands-free operation!


It seems like you could use aider in architecture mode. Basically, it will suggest the solution to your problem fist, and prompt you to start editing, you can say no to refine the solution and only start editing when you are satisfied with it.


Hah, I was trying it the other day in a Go project and it did exactly the same thing. I couldn’t believe my eyes, it basically rewrote all the functions back out in the test file but modified slightly so the thing that was failing wouldn’t even run.


I've had it do similar nonsense.

I just don't understand all the people who honestly believe AGI just requires more GPUs and data when these models are so inherently stupid.


Can't you select Chatgpt as the model in cursor?


Yes, but for some reason it seems to perform worse there.

Perhaps whatever algorithms Cursor uses to prepare the context it feeds the model are a good fit for Claude but not so much for the others (?). It's a random guess, but whatever the reason, there's a weird worsening of performance vs pure chat.


Yes but every model besides claude-3.5-sonnet sucks in Cursor, for whatever reason. They might as well not even offer the other models. The other models, even "smarter" models, perform vastly poorer or don't support agent capability or both.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: