Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

*> the list of commits isn't "sorted" like the precondition for binary search.

Good point.

One may think that in some sense, it is sorted: The history is sorted into "good" commits (before the bug was introduced), which are followed by "bad" commits (after the bug was introduced).

However, the bisection algorithm also works if "good" and "bad" commits are mixed. Even then it will find one commit where the bug was introduced (although there may be other places where it had been fixed and later reintroduced).

This works because bisection doesn't require a sorted list, it just needs a continuous function. And any sequence is just a special case of that.



> However, the bisection algorithm also works if "good" and "bad" commits are mixed. Even then it will find one commit where the bug was introduced (although there may be other places where it had been fixed and later reintroduced).

I don't understand this comment. Need for sorting the list is essentially the same in both cases, because "the bug was reintroduced" is essentially equivalent to "a lower number showed up later". A binary search makes the assumption of sorting because it's premised on the idea, "Whatever number I find, every number that comes after it will be larger". Same with bug-finding in a git history, it assumes "If I find a point in the history where the problematic behavior exists, it exists at all points after that for the same reason."

If you had an "unsorted" history where a bug was fixed and then reintroduced in another way, your binary search may fail to find the immediate cause of your problems in the same way that an "unsorted" binary search would fail to find what it's looking for.


> because "the bug was reintroduced" is essentially equivalent to "a lower number showed up later"

Yes, and this is perfectly okay for "git bisect".

> Need for sorting the list is essentially the same in both cases.

No, bisection does not require sorting, it just needs a continuous range whose endpoints have different signs. Then, it will always find one point where the signs on both sides differ. (In real analysis, this means finding a zero. In discrete maths, it means find the introduction of a sign change.)

In other words, when starting with a range from an old "good" commit to the current "bad" commit, it will always find a "good" commit directly followed by a "bad" commit.

This is why "bisection" is a good term for this Git algorithm, while "binary search" would be misleading. (Even though bisection and binary search are almost identical algorithms. They mostly differ in their input assumptions and output guarantees.)

Bisection makes no assumtion on any sort order, but guarantees to find one point of bug introduction in logarithmic time. It does not give any guarantee to find the lastest point of bug introduction -- unless, of course, your history does have just one such point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: