When I first tried letting Cursor loose on a relatively small code base (1500 lines, 2 files), I had it fix a bug (or more than one) with a clear testcase and a rough description of the problem, and it was a disaster.
The first commit towards the fix was plausible, though still not fully correct, but in the end not only it wasn't able to fix it, each commit was also becoming more and more baroque. I cut it when it wrote almost 100 lines of code to compare version numbers (which already existed in the source). The problem with discussing the plan is that, while debugging, you don't yourself have a full idea of the plan.
I don't call it a total failure because I asked the AI to improve some error messages to help it debug, and I will keep that code. It's pretty good at writing new code, very good at reviewing it, but for me it was completely incapable of performing maintainance.
These tools and LLMs differ in quality, for me Claude Code with Claude 4 was the first tool that worked well enough. I tried Cursor before, it's been a 6+ months ago though, but I wasn't very impressed.
Same for me. Cursor was a mess for me. I don't know why and how it works for other people. Claude code on the other hand was a success from day one and I'm using it happily for months now.
I used Cursor for about 5 months before switching to Claude Code. I was only productive with Cursor when I used it in a very specific way, which was basically me doing by hand what Claude Code does internally. I maintained planning documents, todo lists, used test driven development and linting tools, etc. My .cursorrules file looks like what I imagine the Claude system prompt to be.
Claude Code took the burden of maintaining that off my shoulders.
Also Cursor was/is utterly useless any all non-Anthropic models, which are the default.
This was a problem I regularly had using Copilot w/ GPT4o or Sonnet 3.5/3.7... sometimes I would end up down a rabbit hole and blow multiple days of work, but more typically I'd be out an hour or two and toss everything to start again.
Don't have this w/ Claude Code working over multiple code bases of 10-30k LOC. Part of the reason is the type of guidance I give in the memory files helps keep this at bay, as does linting (ie. class/file length), but I also chunk things up into features that I PR review and have it refactor to keep things super tidy.
Yeah, Github Copilot just didn't work for me at all. The completions are OK and I actually still use it for that but the agent part is completely useless. Claude Code is in another league.
Fwiw, I dipped my toes into AI assisted coding a few weeks ago and started with cursor. Was very unimpressed (spent more time prompting and fight the tool than making forward progress) until I tried Claude code. Happily dropped cursor immediately (cancelled my sub) and am now having a great time using CC productively (just the basic $20/mo plan). Still needs hand-holding but it's a net productivity boost.
The first commit towards the fix was plausible, though still not fully correct, but in the end not only it wasn't able to fix it, each commit was also becoming more and more baroque. I cut it when it wrote almost 100 lines of code to compare version numbers (which already existed in the source). The problem with discussing the plan is that, while debugging, you don't yourself have a full idea of the plan.
I don't call it a total failure because I asked the AI to improve some error messages to help it debug, and I will keep that code. It's pretty good at writing new code, very good at reviewing it, but for me it was completely incapable of performing maintainance.