A personal pet peeve of mine when reading diffs, is when a file has some functions and you insert one and instead of looking like this:
int someOldFunction()
{
// Function body
}
+
+int newFunction()
+{
+ // New function body
+}
It looks like this:
int someOldFunction()
{
// Function body
+}
+
+int newFunction()
+{
+ // New function body
}
It's a small thing, but given that these diffs are equivalent, the one that balances the curly braces within added blocks should be favored. But diff utilities seem to get this pretty consistently wrong.
It is a small thing, but it throws me off everytime i see it, and then it takes a few seconds of looking around before it dawns on me what happened.
The "user" in me would love a language aware diffing (and merging) system, but the developer in me is already groaning about how much work that would end up taking for arguably not that much benefit.
I really like SemanticMerge, it makes total sense, although the diff experience is unlike other diff tools in terms of immediacy. But I think it will be another great tool to add to the arsenal when you need to pull out the big guns on crazy diffs.
Absolutely. I had it installed for some time and mostly thought "this is really neat but I could live without it." And then there was a monster merge with significant conflicts all over the place. I don't think it would have been otherwise possible to pull it off with as little damage as there was if not for this tool.
I really like SemanticMerge, in its current shape it is litte more than a fancy proof of concept.
Its biggest limitation on real projects is that it works on a single-file level, while all the interesting stuff happens on patch level. You may browse their forums to get an idea of what else is missing.
That said, I wish all the best to Codice and I really hope that they continue to invest in this tool.
I think it might be easier to get to a syntax-aware diff if one approached it reusing the language specific syntax highlighting specs used in various editors. I've almost sat down to start that myself a few times.
I don't think it has to be truly language aware. A diff tool that looks for matched braces, quotes, and indents to figure out where the blocks are would do better most of the time.
It doesn't have to be fully language aware, but it has to understand most, if not all, of the syntax. As soon as you start trying to match braces, you need to handle strings, comments, and probably a whole bunch of stuff I'm not thinking of.
Here's an idea: wouldn't it by trivial for <insert your favourite language compiler here> to expose the AST of your code, and solve this problem the easy way?
Oh, just saw this, it involves invoking the proper git tools to get the diff, and then converting the diff format from unified to ed format. The later is actually easier than it might seem as unified diffs start all of their special information in column 0; IIRC I wrote an awk script to do this. It's on my work machine though, so I don't have it handy.
I tried GP's example, and in this particular case both --patience and the default (Myers) work the same, both doing the thing you want them to. Which perplexes me, because I know I've seen the bad case too, but can't seem to find a minimal example of it (I tried a couple variations on the example; they all did the 'right' thing).
That's because they all use greedy O(ND) algorithms or equivalent.
But conceptually, no matter what the algorithm, the greediness is usually a requirement to maintaining the theoretical time bound of LCS based algorithms.
Patience trades the time bound for "better" results (patience is worst case O(ND^2).
Histogram is a neatly engineered and extended version of patience with an O(ND) timebound (and in fact, is faster than both myers and patience while providing good patience-like output).
I find Araxis Merge especially useful in these times. This application has a feature to "set the synchronization link" at any place you want in the code. It is not automatic (or language aware), but once you realize the difference, you can 'fix' the diff in real time. That helped me a lot for big diff files (unfortunately in my current company we don't use Araxis :( )
diff --patience uses a different algorithm that works to optimise the diffed number of lines (it's less efficient though), which will probably solve your issue (the way the standard algorithm works is to essentially find common features on a first-come-first-served basis.
[I'm the author of diff-so-fancy, Steve helped with shipping it as a standalone script]
NPM?!? :)
A lot of people below are asking why a bash script (that depends on a perl script) is being recommended to install via NPM? The short reason is that NPM is the most straightforward way to get a script installed as a global binary in a cross-platform manner. This approach has worked quite well with `git-open`[0]. Asking all users to deal with the PATH is not my ideal.
In addition, I wanted a reasonable upgrade path, in case there are neccessary bugfixes. It's not a great experience if users identify bugs but the fix means they manually find it/download/PATH-ify each time. :/
> That said, I'll add some Manual Install instructions to the readme so it's clear how to do this on your own
Yeah, that helps a lot :)
I saw this, was like "cool, I want to use this", and then noted that it uses npm. I avoid installing ruby and node apps -- I have nothing against either, just that I currently don't use either language or have a dependency on a major tool written in those; but they pull in a lot of deps which take up space (at least, my experiences have been that many of these tools install way too much -- probably because I don't use either and all the "default libs" aren't on my system). On my previous machine I had lots of issues with this, so as a rule I avoid these things unless absolutely necessary. I know others who are of a similar opinion.
Fortunately I realized that it was just a shell script, and installed it directly :)
I see a lot of projects that's written in other languages on npm, especially bash. Substack even published c code to npm. I think it makes the script more accessible especially for nodejs users. I don't mind publishing it to other package mangers.
I realize your question is rhetorical, but there are tons of people. Anyone new to programming, in a CS course that uses git, for example, would be familiar with basic git but many would be unfamiliar with the path (or on Windows).
I concur, but they need not stay unfamiliar with it. The concept is easy: When you type the name of a program and hit enter, I look it up in a list of directories to see the first one that contains a file with that name. That list is $PATH. Any programmer will have to deal with search paths and stuff at some point in their life, and probably very early on, when they'll want to run their own scripts.
I agree that most programmers will run into $PATH at some point, but why force an order on them? Maybe they just want to get started using things like fancy diffs provided through package managers like npm.
Wouldn't it be easier for the end user if you used pip/PyPi? Essentially all Linux distros include Python, but there are very few that ship with Node.js installed by default.
Yeah, I try to avoid "sudo pip install" for CLI utilities if I can (and discourage its use to others). I put ~/.local/bin on my PATH (nonstandard XDG -like convention) and use "pip install --user" instead.
I've seen too many Python environments hosed by folks who aren't Python experts to keep suggesting that "sudo pip install <CLI tool>" is a thing most users should be doing.
I use ~/.bin/ for what sounds like the same purpose. I'm not sure I'd call that a convention - it's just what made sense to me - but it does ease issues requiring that userspace executables be on my $PATH.
So you need to be root to write to /usr/local/bin/. How does NPM magically solve this? (pip has the `--user` flag to install for just the current user, as tom points out.)
It doesn't solve it, and I don't think I claimed it did - I simply inferred that I don't think installing it into the system (or user) Python is a good idea either.
Ah, I see, I misunderstood your comment. But I think pip gives you better options than NPM, which installs into either /usr/local/(...) or the current directory. The latter sounds like a mess waiting to happen.
Python >= 2.7.9 or >= 3.4 ships with pip installed by default. Those versions are more than a year old now. What are you running, Slackware or something?
I've many Debian servers without nodejs _and_ without python.
It requires conscious work as any other dependency, but it's possible (and convenient, if you don't depend on them).
So more than probably, I'll not install npm, to test a bash wrapper to a perl script, that does something that git itself can do without external dependencies.
But obviously, different persons have different concepts of the K.I.S.S. principle.
Being a perl script... why the author didn't use CPAN? it's available in all vanilla installs of Debian, CentOS, Ubuntu, RedHat, etc...
There's no easy install methods for small simple scripts that doesn't involve multiple manual steps, have an automated upgrade path, is cross platform and is consistent+familiar to a large subset of developers.
Of the options out there for the above, npm - while hassle if you don't have it already installed - is probably the closest balance of maintainer and end-developer convenience.
I know he gave credit in the README, but why does this ~30 line shell script need its own repo? Seems more like a cheap grab for Github Stars rather than to provide actual value.
Sheesh. The repo allows the stuff like installing it via `npm install -g diff-so-fancy`. It's not like a public repo costs anything, and Github stars don't get you anything either.
I think it makes it easier to add contributors. Git doesn't allow a user partially access a repo so Paul would add a few people only contributing on one script. I added some really talented contributors :)
The repo under my name doesn't mean it belongs to me. In fact it belongs to the public. Everyone can contribute to it so everyone is the "owner". That's how I see open source projects.
I don't mind giving it back to Paul. I did it only because I needed it. I believe a lot of people would want it more accessible too.
So, the whole npm thing seems weird to me, then it occurred to me that it could be for malicious purposes. Would it be possible to upload a separate package.json to npm that had eg a post-install script? I don't know much about how npm works from the package publication side of things, but I assumed it was similar to pypi where the code in the git repo doesn't have to be at all related to the code in the package
That's a helpful perspective, but hopefully you already have workarounds on the color issue. This is simply making diff pieces easier to copy / pasta -- which I for one have needed to do. Change my mind on a refactor midstream, need to restore part of the file, etc.
You might investigate the tools your editor of choice provides for working with diffs. You might be surprised at how easy it can be!
(In Emacs, if you're using git, magit is an amazing package. You can select a commit from the logs, dive into the diff for a file caused by that commit, highlight a region of the diff and revert the change in your working copy. It's wonderful.)
what I'd like to see is that a/b in front of the filenames disappear. Getting rid of that would FINALLY allow me to double-click on the filename (which is configured to select the part between the spaces and copy it to the clipboard) and paste it instantly for the next command... or to be able to do git diff > foo.patch and on another system do patch < foo.patch without having to remember the correct -p value.
I handle this in a way that is more agnostic to the type of revision control, and fully flexible in coloring (using the most powerful scheme available).
For example, I shouldn't have to put up with basic colors if the terminal can do better.
Here is how it works; starting with:
#!/bin/bash
if [ -r ".svn" ] ; then
exec svn diff ${1+"$@"} | my_colorize_diff
else
git diff ${1+"$@"} | my_colorize_diff
fi
...where the "my_colorize_diff" script at the end of the pipe is as follows:
#!/usr/bin/env perl
# by Kevin Grant (kmg@mac.com)
my $term_program = (exists $ENV{'TERM_PROGRAM'} && defined $ENV{'TERM_PROGRAM'}) ? $ENV{'TERM_PROGRAM'} : '';
my $term = (exists $ENV{'TERM'} && defined $ENV{'TERM'}) ? $ENV{'TERM'} : 'vt100';
my $is_xterm = ($term =~ /xterm/);
my $is_24bit = ($term_program =~ /MacTerm/);
print "\033#3BEGIN DIFF\n";
print "\033#4BEGIN DIFF\n\033#5";
while (<>) {
if (/^\+/ && !/^\+\+/) {
if ($is_24bit) {
print "\033[48:2:150:200:150m", "\033[2K", "\033[38:2::88:m", "\033[1m";
} elsif ($is_xterm) {
print "\033[48;5;149m", "\033[2K", "\033[38;5;235m", "\033[1m";
} else {
print "\033[42m", "\033[2K", "\033[30m", "\033[1m";
}
} elsif (/^\-/ && !/^\-\-/) {
if ($is_24bit) {
print "\033[48:2:244:150:150m", "\033[2K", "\033[38:2:144:0::m";
} elsif ($is_xterm) {
print "\033[48;5;52m", "\033[2K", "\033[38;5;124m";
} else {
print "\033[41m", "\033[2K", "\033[37m";
}
} else {
print "\033[3m";
}
chomp;
print;
print "\033[0m\n";
}
print "\033#3END DIFF\n";
print "\033#4END DIFF\n\033#5";
For what it's worth, there's a lot of 24-bit-capable terminals that aren't MacTerm. Even xterm supports the 24-bit-color sequences, although it picks the closest entry in its 256-colour palette rather than using the 24-bit colour directly.
Also, you seem to be assuming "xterm" supports 256 colours and everything else doesn't. The best way to figure out how many colours the terminal supports is $(tput colours). tput also looks up other useful sequences; you can "tput bold" to turn on bold mode, "tput setaf 12" to set the foreground to colour 12 (bright yellow), "tput sgr0" to zero all active formatting, etc.
Good point. Although, unless there are shells that have "tput" built-in, that means more subprocesses to obtain basic information (which would slow down the result a bit). In my case, the environment is sufficient to figure out what to do.
I didn't quite like this, put it does reference diff-hightlight, which is part of git-contrib (so it may already be installed on your system, but just not in your $PATH!):
the netbeans.team.diff tool is similar (showing the specific words that changed), allows interactive editing, and does a good job even with large insertions and deletions
when one or 2 words change in a long or dense line, it's nice to have the specific changes highlighted
imagine a for loop in which a variable (used on every line) was renamed, and buried in the loop an assignment changed slightly (eg, + became -). with a standard diff, it's really hard (for me) to pick up the minor change. with word by word diffs, it's pretty easy
(i use netbeans diff, not this tool, but they appear similar)