This question is coming from a place of total ignorance: One appeal of the gener...

etbebl · on March 22, 2024

> x = 4 > y = 7 > >are independent statements and the code will be no different if I replace those two statements with > > y = 7 > x = 4

Not always, e.g. in a multi threaded situation where x and y are shared atomics. Then unless we authorize C++ to take more liberties in reordering, another thread will never see y as 7 while x is not yet 4 in the first example, but not the second. This kind of subtlety can't be determined from syntax alone.

thaumasiotes · on March 22, 2024

OK, I tended to agree that the AST was inadequate for this task. But what are we doing with it? That's most of what I want from "structural code diff".

MathMonkeyMan · on March 22, 2024

In a sense, plain old diff is a structural diff. The grammar is a sequence of lines of characters.

All tree-sitter gives you is a _different_ grammar, so that a structural diff can operate on different trees given the same text as diff.

A parse tree still doesn't know anything about the meaning of a program, which is what you need to know in order to determine that those assignments to x and y are unordered.

libre-man · on March 22, 2024

What you want to determine this is not an AST, you want a Program Dependence Graph (PDG), which does encode this information. Creating them is not close to as simple as creating a AST, and for many languages requires either assumptions that will be broken, or result in something very similar to an AST (every node has a dependency on the previous node).

thaumasiotes · on March 22, 2024

OK. What good is the AST? Why do I care about "structural diffs" that don't do this?

The page has several examples:

1. Understand what actually changed.

This appears to show that `guess(path, guess_src).map(tsp::from_language)` has been changed to `language_override.or_else(|| guess(path, guess_src)).map(tsp::from_language)`. The call to `map` is part of a single line of code in the old file, but has been split onto a line of its own in the new file to accommodate the greater complexity of the expression.

The bragging associated with the example is "Unlike a line-oriented text diff, difftastic understands that the inner expression hasn't changed here", but I don't really care about that. I need to pay close attention to which bits of the line have been manipulated into which positions anyway. I'm more impressed by ignoring the splitting of one line into several, which does seem to be a real benefit of basing the diff on an AST.

2. Ignore formatting changes.

This example shows that when I switch the source from which `mockable` is imported from "../common/mockable.js" to "./internal.js", the diff will actively obscure that information by highlighting `mockable` and pretending that `"./internal.js"` is uninteresting code that was there the whole time (because it was already the source of some other imports). This badly confuses a boring visual change ("let's use the syntax for importing several things, instead of one thing") with a very significant semantic change ("let's import this module from a completely different file"). I'm not comfortable with this; there must be a better way to present this information than by suggesting that I shouldn't be worried about it.

(A textual diff, in this case, has the same problem. But when the pitch is that your new tool is better than a textual diff because it understands the code, failing to highlight an important change to the code is worse than it used to be!)

3. Visualize wrapping changes.

This shows that when I change the type of some field from `String` to `Option<String>`, the diff will not highlight the text "String", because that part hasn't changed. This is a change from a textual diff, but it doesn't appear to add much value.

There's a second example to do with code that belongs both before and after other code, in this case an opening/closing tag pair in XML, but in that case the structural diff appears to be identical to a textual diff.

4. Real line numbers.

"Do you know how to read @@ -5,6 +5,7 @@ syntax? Difftastic shows the actual line numbers from your files, both before and after."

I agree that that's a real benefit, but again it doesn't seem to have anything to do with the difference between textual and structural diffs.

------

I think the conceptual appeal of a "structural diff" is that it fails to highlight changes to the code that don't change the behavior of the software. Difftastic clearly believes something different; in the second example, they are failing to highlight a change to the code that does change the behavior of the software. And in the other examples, they are failing to highlight things that haven't changed from some perspectives, but could be argued to have changed from other perspectives -- and that in either case don't derive much benefit from not being highlighted. If changing `String` to `Option<SpecialType>` produced a diff that highlighted `SpecialType` in a separate color from the surrounding `Option<>` wrapping, indicating that the one line of code contained two relevant changes, that might be interesting, but otherwise I don't see the point of not highlighting the inner `String` along with the new wrapping.

So... what is the appeal of structural diffs?

libre-man · on March 27, 2024

Honestly I agree that structural diffs don't solve a problem for me either. I care about formatting too much to only want to rely on them.

I was just replying that if you want to not get a diff for your example to which I replied you have to use a more advanced representation of the code, and AST won't be able to do it.