I'm interested in what stopped you from finishing diffs and diff based editing. ...

stevekrouse · on Jan 3, 2025

Excellent question! We just didn't have the resources at the time on our small team to invest in getting it to be good enough to be default on. We had to move on to other more core platform features.

Though I'm really eager to get back to it. When using Windsurf last week, I was impressed by their diffs on Sonnet. Seems like they work well. I would love to view their system prompt!

I hope that when we have time to resume work on this (maybe in Feb) that we'll be able to get it done. But then again, maybe just patience (and more fast-following) is the right strategy, given how fast things are moving...

reilly3000 · on Jan 3, 2025

I gotchu:

https://www.reddit.com/r/LocalLLaMA/comments/1h7sjyt/windsur...

bugglebeetle · on Jan 3, 2025

No wonder Windsurf breaks down after a few back and forth messages. That prompt is a monster and clogging up the context.

uncomplexity_ · on Jan 4, 2025

hey thanks for this! inaightful

stevekrouse · on Jan 3, 2025

wow, magic. thank you!

simonw · on Jan 3, 2025

An interesting alternative to diffs appears to be straightforward find and replace.

Claude Artifacts uses that: they have a tool where the LLM can say "replace this exact text with this" to update an Artifat without having to output the whole thing again.

ChatGPT's new Canvas feature apparently does a more sophisticated version of that using regular expressions as opposed to simple text matching: https://twitter.com/sh_reya/status/1875227816993943823

r00tanon · on Jan 4, 2025

How about looking at an ast-based method for making changes across code base? https://www.reddit.com/r/Python/comments/17tvm06/astgrep_and...

miki123211 · on Jan 4, 2025

I think this is going to be the answer eventually.

Once one of the AI companies figures out a decent (probably treesitter-based) language to express code selections and code changes in, and then trains a good model on it, they're going to blow everyone else out of the water.

This would help with "context management" tremendously, as it would let the LLM ask for things like "all functions that are callers of this function", without having to load in entire files. Some simpler refactorings could also be performed by just writing smart queries.

zamfi · on Jan 3, 2025

Oh that is super interesting! I wonder if they track how often it succeeds in matching and replacing, I'd love to see those numbers in aggregate.

Total anecdote, but I worked on this for a bit for a research-level-code code editor (system paper to come soon, fingers crossed!) and found that basic find-and-replace was pretty brittle. I also had to be confident the source appears only once (not always the case for my use case), and there was a tradeoff of fuzziness of match / likelihood of perfectly correct source.

But yeah, diffs are super hard because the format requires far context and accurate mathematical computation.

Ultimately, the version of this that worked the best for me was a total hack:

Prefix every line of the code with L#### -- the line number. Ask for diffs to be the original text and the complete replacement text including the line number prefix on both original and replacement. Then, to apply, fuzzy match on both line number and context.

I suspect this worked as well as it did because it transmutes the math and computation problems into pattern-matching and copying problems, which LLMs are (still) much better at these days.

stevekrouse · on Jan 3, 2025

Yes, adding line numbers on each line is one of the ideas we've been considering trying. Thanks for the reminder!

zamfi · on Jan 3, 2025

I suspect any other "hook" would work just as well, a comment with a nonce--and could serve as block boundaries to make changes more likely to be complete?

Graphologue used a version of this too: https://hci.ucsd.edu/papers/graphologue.pdf

miki123211 · on Jan 4, 2025

This is actually a very powerful pattern that everybody building with LLMs should pay attention to, especially when combined with structured outputs (AKA JSON mode).

If you want an LLM to refer to a specific piece of text, give each one an ID and then work with those IDs.

afro88 · on Jan 4, 2025

Aider actually prompts the LLM to use search/replace blocks rather than actual diffs. And then has a bunch of regex, fuzzy search, indent fixing etc code to handle inconsistent respnses.

Aider's author has a bunch of benchmarks and found this to work best with modern models.

stevekrouse · on Jan 3, 2025

Very useful! Thank you!

afro88 · on Jan 4, 2025

What we found was that error handling on the client side was also very important. There's a bunch of that in Aider too for inspiration. Fuzzy search, indent fixing, that kind of stuff.

And also just to clarify, aider landed on search/replace blocks for gpt-4o and claude rather than actual diffs. We followed suit. And then we showed those in a diff UI client side