Aider, with o1 or R1 as the architect and Claude 3.5 as the implementer, is so much better than anything you can accomplish with a single model. It's pretty amazing. Aider is at least one order of magnitude more effective for me than using the chat interface in Cursor. (I still use Cursor for quick edits and tab completions, to be clear).
Aider now has experimental support for using two models to complete each coding task:
- An Architect model is asked to describe how to solve the coding problem.
- An Editor model is given the Architect’s solution and asked to produce specific code editing instructions to apply those changes to existing source files.
Splitting up “code reasoning” and “code editing” in this manner has produced SOTA results on aider’s code editing benchmark. Using o1-preview as the Architect with either DeepSeek or o1-mini as the Editor produced the SOTA score of 85%. Using the Architect/Editor approach also significantly improved the benchmark scores of many models, compared to their previous “solo” baseline scores (striped bars).
Probably gonna show a lot of ignorance here, but isn’t that a big part of the difference between our brains and AI? That instead of one system, we are many systems that are kind of sewn together? I secretly think AGI will just be a bunch of different specialized AIs working together.
Efficient and effective organizations work this way, too: a CEO to plan in broad strokes, employees to implement that vision in specific ways, and managers to make sure their results match expectations.
Then you'll be in "architect" mode, which first prompts o1 to design the solution, then you can accept it and allow sonnet to actually create the diffs.
Most of the time your way works well—I use sonnet alone 90% of the time, but the architect mode is really great at getting it unstuck when it can't seem to implement what I want correctly, or keeps fixing its mistakes by making things worse.
I really want to see how apps created this way scale to large codebases. I’m very skeptical they don’t turn into spaghetti messes.
Coding is basically just about the most precise way to encapsulate a problem as a solution possible. Taking a loose English description and expanding it into piles of code is always going to be pretty leaky no matter how much these models spit out working code.
In my experience you have to pay a lot of attention to every single line these things write because they’ll often change stuff or more often make wrong assumptions that you didn’t articulate. And in my experience they never ask you questions unless you specifically prompt them to (and keep reminding them to), which means they are doing a hell of a lot of design and implementation that unless carefully looked over will ultimately be wrong.
It really reminds me a bit of when Ruby on Rails came out and the blogosphere was full of gushing “I’ve never been more productive in my life” posts. And then you find out they were basically writing a TODO app and their previous development experience was doing enterprise Java for some massive non-tech company. Of course RoR will be a breath of fresh air for those people.
Don’t get me wrong I use cursor as my daily driver but I am starting to find the limits for what these things can do. And the idea of having two of these LLM’s taking some paragraph long feature description and somehow chatting with each other to create a scalable bit of code that fits into a large or growing codebase… well I find that kind of impossible. Sure the code compiles and conforms to whatever best practices are out there but there will be absolutely no constancy across the app—especially at the UX level. These things simply cannot hold that kind of complexity in their head and even if they could part of a developers job is to translate loose English into code. And there is much, much, much, much more to that than simply writing code.
I see what you’re saying and I think that terming this “architect” mode has an implication that it’s more capable than it really is, but ultimately this two model pairing is mostly about combining disparate abilities to separate the “thinking” from the diff generation. It’s very effective in producing better results for a single prompt, but it’s not especially helpful for “architecting” a large scale app.
That said, in the hands of someone who is competent at assembling a large app, I think these tools can be incredibly powerful. I have a business helping companies figure out how/if to leverage AI and have built a bunch of different production LLM-backed applications using LLMs to write the code over the past year, and my impression is that there is very much something there. Taking it step by step, file by file, like you might if you wrote the code yourself, describing your concept of the abstractions, having a few files describing the overall architecture that you can add to the chat as needed—little details make a big difference in the results.
I use Cursor and Composer in agent mode on a daily basis, and this is basically exactly what happened to me.
After about 3 weeks, things were looking great - but lots of spagetti code was put together, and it never told me what I didn't know. The data & state management architecture I had written was simply just not maintainable (tons of prop drilling, etc). Over time, I basically learned common practices/etc and I'm finding that I have to deal with these problems myself. (how it used to be!)
We're getting close - the best thing I've done is create documentation files with lots of descriptions about the architecture/file structure/state management/packages/etc, but it only goes so far.
We're getting closer, but for right now - we're not there and you have to be really careful with looking over all the changes.
The worst thing you can do with aider is let it autocommit to git. As long as you review each set of changes you can stop it going nuts.
I have a codebase maybe 3-500k lines which is in good shape because of this.
I also normally just add the specific files I need to the chat and give it 1-2 sentences for what to do. It normally does the right thing (sonnet obviously).