I've been using cursor since it launched, sticking almost exclusively to claude-...

kace91 · 2025-02-01T00:04:53 1738368293

My experience with cursor and sonnet is that it is relatively good at first tries, but completely misses the plot during corrections.

"My attempt at solving the problem contains a test that fails? No problem, let me mock the function I'm testing, so that, rather than actually run, it returns the expected value!"

It keeps doing that kind of shenanigans, applying modifications that solve the newly appearing problem while screwing the original attempt's goal.

I usually get much better results from regular chatgpt copying and pasting, the trouble being that it is a major pain to handle the context window manually by pasting relevant info and reminding what I think is being forgotten.

delichon · 2025-02-01T01:27:39 1738373259

Claude makes a lot of crappy change suggestions, but when you ask "is that a good suggestion?" it's pretty good at judging when it isn't. So that's become standard operating procedure for me.

It's difficult to avoid Claude's strong bias for being agreeable. It needs more HAL 9000.

4b11b4 · 2025-02-01T02:21:58 1738376518

I'm always asking Claude to propose a variety of suggestions for the problem at hand and their trade-offs, then evaluating them for the top three proposals and why. Then I'll pick one of them and further vet the idea

kace91 · 2025-02-01T13:07:26 1738415246

>It's difficult to avoid Claude's strong bias for being agreeable. It needs more HAL 9000.

Absolutely, I find this a challenge as well. Every thought that crosses my mind is a great idea according to it. That's the opposite attitude to what I want from an engineer's copilot! Particularly from one who also advices junior devs.

esperent · 2025-02-01T03:16:01 1738379761

> when you ask "is that a good suggestion?" it's pretty good at judging when it isn't

Basically a poor man's COT.

jwpapi · 2025-02-01T00:10:27 1738368627

Yes it’s usually worth it to try to write a really good first prompt

earleybird · 2025-02-01T01:28:30 1738373310

More than once I've found myself going down this 'little maze of twisty passages, all alike'. At some point I stop, collect up the chain of prompts in the conversation, and curate them into a net new prompt that should be a bit better. Usually I make better progress - at least for a while.

SamPatt · 2025-02-01T04:52:16 1738385536

This becomes second nature after a while. I've developed an intuition about when a model loses the plot and when to start a new thread. I have a base prompt I keep for the current project I'm working on, and then I ask the model to summarize what we've done in the thread and combine them to start anew.

I can't wait until this is a solved problem because it does slow me down.

jwpapi · 2025-02-13T23:36:13 1739489773

Yes when new models come out it feels like breaking up.

dr_dshiv · 2025-02-01T01:36:13 1738373773

Why is it so hard to share/find prompts or distill my own damn prompts? There must be good solutions for this —

garfij · 2025-02-01T03:48:20 1738381700

What do you find difficult about distilling your own prompts?

After any back and forth session I have reasonably good results asking something like "Given this workflow, how could I have prompted this better from the start to get the same results?"

dr_dshiv · 2025-02-01T13:25:59 1738416359

Analysis of past chats in bulk.

whall6 · 2025-02-01T02:28:16 1738376896

Don’t outsource the only thing left for our brains to do themselves :/

sheepscreek · 2025-02-01T05:57:43 1738389463

For my advanced use case involving Python and knowledge of finance, Sonnet fared poorly. Contrary to what I am reading here, my favorite approach has been to use o1 in agent mode. It’s an absolute delight to work with. It is like I’m working with a capable peer, someone at my level.

Sadly there are some hard limits on o1 with Cursor and I cannot use it anymore. I do pay for their $20/month subscription.

electroly · 2025-02-01T06:56:26 1738392986

> o1 in agent mode

How? It specifically tells me this is unsupported: "Agent composer is currently only supported using Anthropic models or GPT-4o, please reselect the model and try again."

sheepscreek · 2025-02-01T12:05:16 1738411516

I think you’re right - I must have used it in regular mode, then got GPT-4o to fill in the gaps. It can fully automate a lot of menial work, such as refactors and writing tests. Though I’ll add, I had a roughly 50% success with GPT-4o bug fixing in agent mode, which is pretty great in my experience. When it did work, it felt glorious - 100% hands-free operation!

axkdev · 2025-02-01T11:39:54 1738409994

It seems like you could use aider in architecture mode. Basically, it will suggest the solution to your problem fist, and prompt you to start editing, you can say no to refine the solution and only start editing when you are satisfied with it.

mathieuh · 2025-02-01T03:24:53 1738380293

Hah, I was trying it the other day in a Go project and it did exactly the same thing. I couldn’t believe my eyes, it basically rewrote all the functions back out in the test file but modified slightly so the thing that was failing wouldn’t even run.

nprateem · 2025-02-01T07:13:55 1738394035

I've had it do similar nonsense.

I just don't understand all the people who honestly believe AGI just requires more GPUs and data when these models are so inherently stupid.

hahajk · 2025-02-01T01:02:15 1738371735

Can't you select Chatgpt as the model in cursor?

kace91 · 2025-02-01T01:10:22 1738372222

Yes, but for some reason it seems to perform worse there.

Perhaps whatever algorithms Cursor uses to prepare the context it feeds the model are a good fit for Claude but not so much for the others (?). It's a random guess, but whatever the reason, there's a weird worsening of performance vs pure chat.

electroly · 2025-02-01T02:40:04 1738377604

Yes but every model besides claude-3.5-sonnet sucks in Cursor, for whatever reason. They might as well not even offer the other models. The other models, even "smarter" models, perform vastly poorer or don't support agent capability or both.

zackproser · 2025-01-31T23:54:01 1738367641

Not trying to be snarky, but the example prompt you provided is about 1/15th the length and detail of prompts I usually send when working with Cursor.

I tend to exhaustively detail what I want, including package names and versions because I've been to that movie before...

inerte · 2025-02-01T00:22:24 1738369344

What works nice also is the text to speech. I find it easier and faster to give more context by talking rather than typing, and the extra content helps the AI to do its job.

And even though the speech recognition fails a lot on some of the technical terms or weirdly named packages, software, etc, it still does a good job overall (if I don’t feel like correcting the wrong stuff).

It’s great and has become somewhat of a party trick at work. Some people don’t even use AI to code that often, and when I show them “hey have you tried this?” And just tell the computer what I want? Most folks are blown away.

cadence- · 2025-02-01T01:37:38 1738373858

Does the Cursor have text-to-speech functionality?

fud101 · 2025-02-01T02:39:06 1738377546

you mean speech to text right?

chefandy · 2025-02-01T03:41:33 1738381293

Not for me. I first ask Advanced Voice to read me some code and have Siri listen and email it to an API I wrote which uses Claude to estimate the best cloud provider to run that code based on its requirements and then a n8n script deploys it and send me the results via twilio.

inerte · 2025-02-01T22:47:37 1738450057

Sorry! Yes, speech to text.

crooked-v · 2025-02-01T02:24:37 1738376677

If have to write a prompt that long, it'll be faster to just write the code.

aprilthird2021 · 2025-02-01T02:48:51 1738378131

Shocking to see this because this was essentially the reason most of the previous no code solutions never took off...

esperent · 2025-02-01T03:22:04 1738380124

That sounds exhausting. Wouldn't it be faster to include you package.json in the context?

I sometimes do this (using Cline), plus create a .cline file at project root which I refine over time and which describes both the high level project overview, details of the stack I'm using, and technical details I want each prompt to follow.

Then each actual prompt can be quite short: read files x, y, and z, and make the following changes... where I keep the changes concise and logically connected - basically what I might do for a single pull request.

mvkel · 2025-02-01T00:34:21 1738370061

My point was that a prompt that simple could be held and executed very well by sonnet, but all other models (especially reasoning models) crash and burn.

It's a 15 line tsx file so context shouldn't be an issue.

Makes me wonder if reasoning models are really proper models for coding in existing codebases

liamwire · 2025-02-01T01:55:09 1738374909

Your last point matches what I’ve seen some people (simonw?) say they’re doing currently: using aider to work with two models—one reasoning model as an architect, and one standard LLM as the actual coder. Surprisingly, the results seem pretty good vs. putting everything on one model.

mvkel · 2025-02-01T02:24:54 1738376694

This is probably the right way to think about it. O1-pro is an absolute monster when it comes to architecture. It is staggering the breadth and depth that it sees. Ask it to actually implement though, and it trips over its shoelaces almost immediately.

goosejuice · 2025-02-01T02:48:13 1738378093

Can you give an example of this monstrous capability you speak of? What have you used it for professionally w.r.t. architecture.

mvkel · 2025-02-01T20:30:40 1738441840

The biggest delta over regular o1 that I've seen is asking it to make a PRD of an app that I define as a stream-of-consciousness with bullet points.

It's fantastic at finding needles in the haystack, so the contradictions are nonexistent. In other words, it seems to identify which objects would interrelate and builds around those nodes, where o1 seems to think more in "columns."

To sum it up, where o1 feels like "5 human minute thinking," o1-pro feels like "1 human hour thinking"

hombre_fatal · 2025-02-01T04:21:11 1738383671

You’re basically saying you write 15x the prompt for the same result they get with sonnet.

jwpapi · 2025-02-01T00:11:14 1738368674

Yes this works good for me too rather take your time and do the first prompt right

chrismsimpson · 2025-02-01T01:55:30 1738374930

I’ve coded in many languages over the years but reasonably new to the TS/JS/Next world.

I’ve found if you give your prompts a kind long form “stream of consciousness”, where you outline snippets of code in markdown along with contextual notes and then summarise/outline at the end what you actually wish to achieve, you can get great results.

Think a long form, single page “documentation” type prompts that alternate between written copy/contextual intent/description and code blocks. Annotating code blocks with file names above the blocks I’m sure helps too. Don’t waste your context window on redundant/irrelevant information or code, stating a code sample is abridged or adding commented ellipses seems to do the job.

d357r0y3r · 2025-02-01T03:40:08 1738381208

By the time I've fully documented and explained what I want to be done, and then review the result, usually finding that it's worse than what I would have written myself, I end up questioning my instinct to even reach for this tool.

I like it for general refactoring and day to day small tasks, but anything that's relatively domain-specific, I just can't seem to get anything that's worth using.

noahbp · 2025-02-01T03:48:30 1738381710

Like most AI tools, great for beginners, time-savers for intermediate users, and frequently a waste of time in domains where you're an expert.

I've used Cursor for shipping better frontend slop, and it's great. I skip a lot of trial and error, but not all of it.

epolanski · 2025-02-01T06:54:45 1738392885

,> and frequently a waste of time in domains where you're an expert.

I'm a domain expert and I disagree.

There's many scenarios where using LLMs pays off.

E.g. a long file or very long function are just that, and an LLM is faster at understanding it whole not being limited in how many things you can track in your mind at once (between 4 and 6). It's still gonna be faster at refactoring it and testing it than you will.

d357r0y3r · 2025-02-01T03:57:33 1738382253

I agree that it's amazing as a learning tool. I think the "time to ramp" on a new technology or programming language has probably been cut in half or more.

twilightfringe · 2025-02-01T02:00:25 1738375225

ha! good to confirm! I tend to do this, just kind of as a double-check thing, but never sure if it actually worked or if it was a placebo, lol.

Or end with "from the user's perspective: all the "B" elements should light up in excitement when you click "C""

mvkel · 2025-02-01T02:22:29 1738376549

Going to try this! Thanks for the tip

MaxLeiter · 2025-02-01T00:32:43 1738369963

We've been working on solving a lot of these issues with v0.dev (disclaimer: shadcn and I work on it). We do a lot of pre and post-processing to ensure LLMs output valid shadcn code.

We're also talking to the cursor/windsurf/zed folks on how we can improve Next.js and shadcn in the editors (maybe something like llms.txt?)

mvkel · 2025-02-01T00:36:33 1738370193

Thanks for all the work you do! v0 is magical. I absolutely love the feature where I can add a chunky component that v0 made to my repo with npx

harshitaneja · 2025-02-01T06:19:35 1738390775

So I think I finally understood recently why we have these divergent groups with one thinking Claude 3.5 Sonnet is the best model for coding and another that follow the OpenAI SOTA at that moment. I have been a heavy user of ChatGPT, jumping on to pro without even thinking for more than a second once released. Recently though I took a pause from my usual work on statistical modelling, heuristics work and other things in certain deep domains to focus on building client APIs and frontends and decided to again give Claude a try and it is just so great to work with for this usecase.

My hypothesis is its a difference of what you are doing. OpenAI O models are much better than others at mathematical modelling and such tasks and Claude for more general purpose programming.

mycall · 2025-02-01T06:57:41 1738393061

Have you used multi-agent chat sessions with each fielding their own specialities and seeing if that improves your use cases aka MoE?

harshitaneja · 2025-02-01T16:13:13 1738426393

I have not. Any suggestions on which one(s) to explore to get started.

energy123 · 2025-02-01T00:17:09 1738369029

Context length possibly. Prompt adherence drops off with context, and anything above 20k tokens is pushing it. I get the best results by presenting the smallest amount of context possible, including removing comments and main methods and functions that it doesn't need to see. It's a bit more work (not that much if you have a script that does it for you), but the results are worth it. You could test in the chatgpt app (or lmarena direct chat) where you ask the same question but with minimal hand curated context, and see if it makes the same mistake.

mvkel · 2025-02-01T00:40:24 1738370424

If it's a context issue, it's an issue with how cursor itself sends the context to these reasoning LLMs.

Context alone shouldn't be the reason that sonnet succeeds consistently, but others (some which have even bigger context windows) fail.

energy123 · 2025-02-01T00:43:42 1738370622

Yes, that's what I'm suggesting. Cursor is spamming the models with too much context, which harms reasoning models more than it harms non-reasoning models (hypothesis, but one that aligns with my experience). That's why I recommended testing reasoning models outside of Cursor with a hand curated context.

The advertised context length being longer doesn't necessarily map 1:1 with the actual ability the models have to perform difficult tasks over that full context. See for example the plots of performance on ARC vs context length for o-series models.

jvanderbot · 2025-01-31T23:51:04 1738367464

I've found cursor to be too thin a wrapper. Aider is somehow significantly more functional. Try that.

dhc02 · 2025-01-31T23:55:27 1738367727

Aider, with o1 or R1 as the architect and Claude 3.5 as the implementer, is so much better than anything you can accomplish with a single model. It's pretty amazing. Aider is at least one order of magnitude more effective for me than using the chat interface in Cursor. (I still use Cursor for quick edits and tab completions, to be clear).

dwaltrip · 2025-02-01T00:02:32 1738368152

I haven't tried aider in quite a while, what does it mean to use one model as an architect and another as the implementer?

Terretta · 2025-02-01T00:08:05 1738368485

Aider now has experimental support for using two models to complete each coding task:

- An Architect model is asked to describe how to solve the coding problem.

- An Editor model is given the Architect’s solution and asked to produce specific code editing instructions to apply those changes to existing source files.

Splitting up “code reasoning” and “code editing” in this manner has produced SOTA results on aider’s code editing benchmark. Using o1-preview as the Architect with either DeepSeek or o1-mini as the Editor produced the SOTA score of 85%. Using the Architect/Editor approach also significantly improved the benchmark scores of many models, compared to their previous “solo” baseline scores (striped bars).

https://aider.chat/2024/09/26/architect.html

lukas099 · 2025-02-01T03:03:30 1738379010

Probably gonna show a lot of ignorance here, but isn’t that a big part of the difference between our brains and AI? That instead of one system, we are many systems that are kind of sewn together? I secretly think AGI will just be a bunch of different specialized AIs working together.

Terretta · 2025-02-01T03:20:24 1738380024

You're in good company in that secret thought.

Have a look at this: https://en.wikipedia.org/wiki/Society_of_Mind

dhc02 · 2025-02-03T23:16:34 1738624594

Efficient and effective organizations work this way, too: a CEO to plan in broad strokes, employees to implement that vision in specific ways, and managers to make sure their results match expectations.

ChadNauseam · 2025-02-01T00:01:34 1738368094

I normally use aider by just typing in what I want and it magically does it. How do I use o1 or R1 to play the role of the "architect"?

macNchz · 2025-02-01T00:16:14 1738368974

You can start it with something like:

    aider --architect --model o1 --editor-model sonnet

Then you'll be in "architect" mode, which first prompts o1 to design the solution, then you can accept it and allow sonnet to actually create the diffs.

Most of the time your way works well—I use sonnet alone 90% of the time, but the architect mode is really great at getting it unstuck when it can't seem to implement what I want correctly, or keeps fixing its mistakes by making things worse.

cruffle_duffle · 2025-02-01T01:13:50 1738372430

I really want to see how apps created this way scale to large codebases. I’m very skeptical they don’t turn into spaghetti messes.

Coding is basically just about the most precise way to encapsulate a problem as a solution possible. Taking a loose English description and expanding it into piles of code is always going to be pretty leaky no matter how much these models spit out working code.

In my experience you have to pay a lot of attention to every single line these things write because they’ll often change stuff or more often make wrong assumptions that you didn’t articulate. And in my experience they never ask you questions unless you specifically prompt them to (and keep reminding them to), which means they are doing a hell of a lot of design and implementation that unless carefully looked over will ultimately be wrong.

It really reminds me a bit of when Ruby on Rails came out and the blogosphere was full of gushing “I’ve never been more productive in my life” posts. And then you find out they were basically writing a TODO app and their previous development experience was doing enterprise Java for some massive non-tech company. Of course RoR will be a breath of fresh air for those people.

Don’t get me wrong I use cursor as my daily driver but I am starting to find the limits for what these things can do. And the idea of having two of these LLM’s taking some paragraph long feature description and somehow chatting with each other to create a scalable bit of code that fits into a large or growing codebase… well I find that kind of impossible. Sure the code compiles and conforms to whatever best practices are out there but there will be absolutely no constancy across the app—especially at the UX level. These things simply cannot hold that kind of complexity in their head and even if they could part of a developers job is to translate loose English into code. And there is much, much, much, much more to that than simply writing code.

macNchz · 2025-02-01T02:03:42 1738375422

I see what you’re saying and I think that terming this “architect” mode has an implication that it’s more capable than it really is, but ultimately this two model pairing is mostly about combining disparate abilities to separate the “thinking” from the diff generation. It’s very effective in producing better results for a single prompt, but it’s not especially helpful for “architecting” a large scale app.

That said, in the hands of someone who is competent at assembling a large app, I think these tools can be incredibly powerful. I have a business helping companies figure out how/if to leverage AI and have built a bunch of different production LLM-backed applications using LLMs to write the code over the past year, and my impression is that there is very much something there. Taking it step by step, file by file, like you might if you wrote the code yourself, describing your concept of the abstractions, having a few files describing the overall architecture that you can add to the chat as needed—little details make a big difference in the results.

tribeca18 · 2025-02-01T03:02:24 1738378944

I use Cursor and Composer in agent mode on a daily basis, and this is basically exactly what happened to me.

After about 3 weeks, things were looking great - but lots of spagetti code was put together, and it never told me what I didn't know. The data & state management architecture I had written was simply just not maintainable (tons of prop drilling, etc). Over time, I basically learned common practices/etc and I'm finding that I have to deal with these problems myself. (how it used to be!)

We're getting close - the best thing I've done is create documentation files with lots of descriptions about the architecture/file structure/state management/packages/etc, but it only goes so far.

We're getting closer, but for right now - we're not there and you have to be really careful with looking over all the changes.

nprateem · 2025-02-01T07:24:21 1738394661

The worst thing you can do with aider is let it autocommit to git. As long as you review each set of changes you can stop it going nuts.

I have a codebase maybe 3-500k lines which is in good shape because of this.

I also normally just add the specific files I need to the chat and give it 1-2 sentences for what to do. It normally does the right thing (sonnet obviously).

dhc02 · 2025-02-03T23:18:20 1738624700

Yes! Turn off autocommit, everyone! Review and test, then git commit.

aledalgrande · 2025-02-01T05:36:16 1738388176

Same with Cline

digitcatphd · 2025-02-01T06:45:16 1738392316

The reality is I suspect one will use different models for different things. Think of it like having different modes of transportation.

You might use your scooter, bike, car, jet - depending on the circumstances. A bike was invented 100 years ago? But it may be the best in the right use case. Would still be using DaVinci for some things because we haven't bothered swapping it and it works fine.

For me - the value of R1/o3 is visible logic that provides an analysis that can be critiqued by Sonnet 3.5

ido · 2025-02-01T07:27:38 1738394858

I have an even more topical analogy! Using different languages for different tasks. When I need some one off script do automate some drudgery (take all files with certain pattern in their name, for each do some search and replace in the text inside, zip them, upload zip to URL, etc) I use python. When Im working on a multi-platform game I use c# (and unity). When I need to make something very lean that works in mobile browsers I use JS with some light-weight libraries.

esperent · 2025-02-01T00:25:45 1738369545

Claude uses Shadcn-ui extensively in the web interface, to the point where I think it's been trained to use it over other UI components.

So I think you got lucky and you're asking it to write using a very specific code library that it's good at, because it happens to use it for it's main userbase on the web chat interface.

I wonder if you were using a different component library, or using Svelte instead of React, would you still find Claude the best?

pollinations · 2025-02-01T10:00:42 1738404042

I was recently trying to write a relatively simple htmx service with Claude. I was surprised at how much worse it was when it's not React.

dhc02 · 2025-02-03T23:23:49 1738625029

I'm going to give you a video to watch. It's not mine, and I don't know much about this particular youtuber, but it really transformed how I think about writing and structuring the prompts I use, which solved problems similar to what you're describing here.

https://youtu.be/y_ywOVQyafE?si=IvKjy7QUYgxGPNgD

PS (I have not bought the guy's course and have no idea whether it's any good)

eagleinparadise · 2025-02-01T00:40:39 1738370439

Cursor is also very user-unfriendly in providing alternative models to use in composer (agent). There's a heavy reliance on Anthrophic for cursor.

Try using Gemini thinking with Cursor. It barely works. Cmd-k outputs the thinking into the code. Its unusable in chat because the formatting sucks.

Is there some relationship between Cursor and Anthropic, i wonder. Plenty of other platforms seem very eager to give users model flexibility, but Cursor seems to be lacking.

I could be wrong, just an observation.

ttroyr · 2025-02-04T15:33:15 1738683195

Originally, actually there was a relationship between Cursor & OpenAI. Something like Cursor was supported by the OpenAI startup fund. So Cursor seems to have branched out. I think they are just emphasizing the models they find most effective. I'm surprised they haven't (apparently) incorporated Claude prompt caching yet for Sonnet.

adonese · 2025-02-01T07:09:42 1738393782

My general workflow with ai so far has been this: - I use copilot mostly for writing unit tests. It mostly works well since the unit tests follow a standard template. - I use the chat one for alternating between different approaches and (in)validating certain approaches

My day job is a big monorepo, I have not investigated that yet but I believe the models context sizes fall short there and as such the above use cases only works for me.

hombre_fatal · 2025-02-01T04:24:27 1738383867

I have the same experience. Just today I was integrating a new logging system with my kubernetes cluster.

I tried out the OP model to make changes to my yaml files. It would give short snippets and I’d have to keep trial and erroring its suggestions.

Eventually I pasted the original prompt to Claude and it one-shot the dang thing with perfect config. Made me wonder why I even try new models.

kristopolous · 2025-02-01T00:32:47 1738369967

"not" and other function words; usually work fine today but if I'm having trouble, the best thing to do is probably be inclusive, not exclusive.

foobiekr · 2025-02-01T01:40:56 1738374056

Have you tried any of the specialty services like Augment? I am curious if they are any better or just snake oil.

pknerd · 2025-02-01T04:45:01 1738385101

OT: How many tokens are being consumed? How much are you paying for Claude APIs?

Abishek_Muthian · 2025-02-01T03:58:45 1738382325

Just curious, did you try a code model like Codestral instead of a MoE?

bugglebeetle · 2025-02-01T04:02:32 1738382552

o3 mini’s date cut-off is 2023, so it’s unfortunately not gonna be useful for anything that requires knowledge of recent framework updates, which includes probably all big frontend stuff.

OkGoDoIt · 2025-02-01T00:46:40 1738370800

I also have been less impressed by o1 in cursor compared to sonnet 3.5. Usually what I will do for a very complicated change is ask o1 to architect it, specifically asking it to give me a detailed plan for how it would be implemented, but not to actually implement anything. I then change the model to Sonnet 3.5 to have it actually do the implementation.

And on the side of not being able to get models to understand something specific, there’s a place in a current project where I use a special Unicode apostrophe during some string parsing because a third-party API needs it. But any code modifications by the AI to that file always replace it with a standard ascii apostrophe. I even added a comment on that line to the effect of “never replaced this apostrophe, it’s important to leave it exactly as it is!” And also put that in my cursor rules, and sometimes directly in the prompt as well, but it always replaces it even for completely unrelated changes. I’ve had to manually fix it like 10 times in the last day, it’s infuriating.