Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
WebKit sorts JS arrays using Selection Sort (webkit.org)
53 points by lars on July 13, 2012 | hide | past | favorite | 40 comments


It appears to me from the line:

    631 if (thisObj->classInfo() == &JSArray::s_info && !asArray(thisObj)->inSparseMode()) {
that only non-arrays, or arrays in some kind of "sparse" mode are sorted inefficiently.

I'm not sure what sparse mode is, but let's try my Raymond Chen inspired psychic powers: if you assign a[0]=1 and a[1000000]=2, you don't want a 1 to 999999 to be stored, so an array like that will end up in a sparse mode which functions more like a hash table keyed by integers.

The same applies to non-arrays: since they aren't true arrays, they won't be stored in the blessed, contiguous way that the sort routine is expecting, so a slower access method has to be used.

Now, fast sorting is presumably written assuming the array is in a contiguous array. Since in sparse mode the array isn't like that, we aren't on the fast path, so it doesn't matter if we are slow, and correctness and special cases are more important, hence the simple selection sort algorithm.

Note also that line 633, 635 and 637 further specialise the fast path into three cases: a default sort with no user-defined comparator, a sort based on a built-in numerical comparator, and a more general sort using a user-defined comparator. (EDIT: The functions called on 633, 635 and 637 are defined in http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/r... from line 1409 onwards, and use quicksort or mergesort, except in the general case which uses an AVL tree, which feels odd but I'm sure there was a reason.)

TL;DR: This only applies to certain arrays; the performance is generally fine else someone would have noticed by now; if you are sorting array-like things or sparse arrays regularly, profile to discover if this is a problem for you.


Yeah... to demonstrate how painfully incorrect this submission is, I think this comment from the real sorting code (which switches between mergesort and quicksort for different workloads) does quite well:

    // For numeric comparison, which is fast, qsort is faster than mergesort. We
    // also don't require mergesort's stability, since there's no user visible
    // side-effect from swapping the order of equal primitive values.


If this is not the "real" sorting code, why is it there? If it's not real, it should be gone. If it's real, it's wrong. The proper response to a code review that finds broken code is to fix it, not excuse it.


Look, this submission states "WebKit sorts JS arrays using Selection Sort". In fact, WebKit will sort things that are not JS arrays using selection sort (the first check); and, as of 7 months ago, will sort JS arrays that are "in sparse mode" as if they were not JS arrays ("non-standard" being the terminology).

I am not arguing selection sort is a good sort for the reason chosen, but I will maintain that the real (yes: "real", that is the correct term to use here) sorting algorithm used by WebKit for "JS arrays" (remember: I was complaining about "how incorrect this submission is") is actually a choice between mergesort, quicksort, and an AVL tree.

Now, if the submission title had been "WebKit sorts sparse JS arrays using selection sort" or even "WebKit sorts non-standard JS arrays using selection sort" (to evoke the wording of the commit message 7 months ago), that would be different: that would not be a linkbait title. I could even maybe get behind "WebKit uses four different sorting algorithms, and one of them is selection sort?!".

However, when one sees the title "WebKit sorts JS arrays using Selection Sort", I think many, if not most, reasonable developers go "oh, wow, maybe I should be avoiding the WebKit sort algorithm, given that my arrays are reasonably sized... I wonder if they'll get that fixed"; but, when you click through to the actual code and read it you realize this would be wrong.

So, from my perspective, you are now arguing a strawman, and are doing so fairly abrasively :(. To be clear: I certainly am not defending the existence of selection sort in the codebase, even for non-standard arrays. That does not make this submission correct, or even reasonable: it is looking at the wrong code and making it out to be something it is not (even quite generally, "important").

Regardless, I got curious, and decided to look more into that sorting code. It is apparently sufficiently old that it predates WebKit being called "WebKit", so I went back through kdelibs and found the original commit that added it, from January 2001. At this time, most of the array implementation was still "does nothing, needs to be implemented" or "does the wrong thing, needs to be fixed".

http://quickgit.kde.org/index.php?p=kdelibs.git&a=commit...

It was in March of 2003, during a commit-spree of merging from Safari back to kjs, that the quick sort implementation and code to use it for JS arrays was added. This means that this submission's title was correct for two years, incorrect for almost 9 years, and then maybe-sort-of-almost-correct for 7 months.

http://quickgit.kde.org/index.php?p=kdelibs.git&a=commit...

http://quickgit.kde.org/index.php?p=kdelibs.git&a=commit...


I guess the response is symmetric: I'm not defending the title of the post. I'm saying that this is an example of code review discovering a real problem in code, and that it needs to be fixed. You apparently agree, so yay. :)


Empirically, when I tested sort algorithms, Safari makes fewer calls to the comparison function than either Firefox or Chrome, and all of the sorts are definitely O(n log n) in the average case. http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/r... seems to be the real algorithm. Since Safari's sort is stable, I'm guessing that merge sort is being used.

Chrome uses quicksort + insertion sort, as mentioned in another comment. Firefox uses merge sort + insertion sort (https://mxr.mozilla.org/mozilla-central/source/js/src/jsarra...).

As a side note, Firefox's sort implementation slows down drastically when you use a custom comparison function because it has to call from C++ back out to JS (https://bugzilla.mozilla.org/show_bug.cgi?id=715181). Chrome's sort is presumably implemented in JS for this reason.


The AVL tree is used for sorting when there is a custom comparison function because we cannot guarantee that the user-provided comparison function conforms to the requirements of the underlying standard library sort functions. Websites have a habit of providing nonsense comparison functions (e.g., using function() { return 0.5 - Math.random(); } to "shuffle" via sorting) and we need to ensure that these sorts will give terminate.

EDIT: Sparse mode is when JSC::JSArray switches from storing the array data as a vector to storing it in a map. You can see from the code at http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/r... that a vector will be used for sufficiently small arrays (< 10,000 elements) and for larger arrays that are at least 12.5% full.


Performance is generally fine only because this is a ridiculously obscure feature that no one sane would ever trip over. Frankly I had no idea "sparse arrays" existed, nor would I have ever used them if I did (that's what a hash table is for). No doubt they're there because some browser did it once, some important site relied on it, and now we're all stuck with performance constraints (don't explode for huge unassigned array sizes) that don't make any sense.

No, O(N^2) sorting algorithms are just dumb in library/runtime[1] code. No one expects that, it will eventually bite anyone who uses the code. And this code is a booby trap, no one even knows if they're using it!

[1] In application code, where you might know a priori that they'll never sort more than "one screen worth" of data, I can see it being a reasonable hack. But never as part of a standard feature called "sort".


Except that every sort algorithm in every library on earth falls back to selection sort (or similar) for small problems (e.g. N < 10), because that's more efficient than drilling a merge sort all the way down.


How is that relevant? Is your contention that the code in question will never see performance problems because there's a constant factor that outweighs the runtime? That's not my read at all. It looks to me like sorting 40k items that happen to be packed in a sparse array is going to take on the order of a billion operations. No?


FWIW, this code's over a decade old. Presumably if it had any negative performance implications, someone would have noticed by now:

  author	kocienda <kocienda@>	
  Fri, 24 Aug 2001 10:24:45 -0400 (14:24 +0000)

  Imported sources from kde-2.2 distribution
http://git.chromium.org/gitweb/?p=external/Webkit.git;a=blob...


I ran a simple test with Chrome 20:

  var a=new Array(4e9);
  for (var i=1; i<a.length; i=Math.ceil(i*1.0001))
    a[i]=new Number(i).toString();
  a.sort();
The speed suggest that this sparse string array is not sorted with a quadratic algorithm (today).


This is also fast in Safari because the array isn't sparse enough for JavaScriptCore to send it down the fallback path.

EDIT: In order to go down the fallback path the array needs to be both sufficiently large and sufficiently sparse (< 12.5% occupied). You can see the difference in algorithms by modifying your test case like so:

Dense, fast path:

    var a = [];
    for (var i = 1; i < 4e6; i++)
      a[i * 5] = i.toString();
    a.sort();
Sparse, fallback path:

    var a = [];
    for (var i = 1; i < 4e6; i++)
      a[i * 10] = i.toString();
    a.sort();


By my count, it's very sparse - only 0.003% occupied. I suppose the initial size declaration could make it dense, but then it would use a considerable amount of (virtual?) memory. Unfortunately, I don't have access to Safari to test that.


Hmm… it looks like I misspoke. The "sparse mode" check that Array.prototype.sort performs in JavaScriptCore appears to be different than the its concept of sparse array storage, which is used under the conditions I described. The "sparse mode" check in the sorting code appears to be related to whether the array instance has getters/setters or properties defined via Object.defineProperty. The performance difference my code demonstrates is due to the extra overhead of iterating the sparse array storage compared to a vector, and not due to the different sorting algorithms that are employed.


Note that while Chromium uses WebKit as the rendering engine, it does not use JavaScriptCore as the JavaScript engine. Rather, Chromium uses V8 (which is not a WebKit project).

http://trac.webkit.org/wiki/JavaScriptCore


This is a big assumption.

See this animated example that danso posted: http://www.sorting-algorithms.com/

Yes, a difference of maybe 5ms won't be noticed... unless you're going a lot of sorting. Selection Sort is basically the slowest sort of all.


One thing I learned when optimizing code is to never trust my intuition.

That piece of code you think must be slow and must be impacting the application speed? It might not be consequential at all. Spending 4 hours optimizing it might get you exactly 0% improvement.

And that one line you think is benign and would take 5 minutes to opt might end up being the key to speeding up your entire algorithm tenfold.

It's important to understand speed and complexity when architecting software, but it's just as important to know how it really performs in the real world, on the CPU, with real memory constraints and real data.

I'm not saying this is true at all for this particular algorithm, but it's been uncannily true for me in the past and I still fool myself to this day trying to optimise prematurely based on my primitive human computer.


V8 sort algorithm (Quicksort + Insertion Sort):

http://code.google.com/p/v8/source/browse/trunk/src/array.js...


It's known as Introsort


No, Introsort is (Quicksort + Heapsort) designed to prevent algorithm to degenerate O(n^2) for particular input sets.

Afaik v8 (and Java's pre Java 7 versions) quicksort algorithms are based on Jon Bentley's work.


FWIW, this animated demo of the comparative sorting efficiencies http://www.sorting-algorithms.com/


Does any browser actually rely on WebKit's JS Core? I know Safari has Nitro and Chrome has v8, so I'm not sure what would use this code except maybe konqueror.


nitro == jscore


If that's the case, then why does UIWebView under-perform Mobile Safari? (I know Mobile Safari uses nitro, but I have no idea what UIWebView uses under the covers). Also, Apple's known for being a bit slow to contribute some things back to WebKit. I wouldn't be surprised to see them not contributing nitro back.


I think they're both using Nitro (depending on who you ask: http://ariya.ofilabs.com/2012/06/nitro-javascriptcore-and-ji...), but UIWebViews don't have JIT compilation turned on (no executable memory pages allowed in iOS apps), so they're slower.


JavaScriptCore (like other js engines) has an interpreter and a JIT. The JIT requires memory that's both writable and executable. iOS doesn't allow memory that's both writable and executable for security reasons, so normal apps can't use the JIT parts of jsc. Safari gets an exception from the "no writable and executable memory" rule.

(That's the technical reason; if you're wearing a tinfoil hat, you could argue that Apple wants to make sure native apps work a lot better than thin UIWebView shims, for lock-in and whatnot.)


Apple engineers contributed "Nitro" to the WebKit SVN repository before it even gained that name.


You're telling me Apple doesn't have their own Safari patches to compete in the speed war? Are they all contributed back to jscore?


The version of JavaScriptCore (and the rest of WebKit) available from svn.webkit.org is exactly what is shipped in Safari on OS X.


This is part of the reason why I ported Vladimir Yaroslavskiy's dual-pivot quicksort to JavaScript (https://github.com/square/crossfilter/blob/master/src/quicks...) for Crossfilter (http://square.github.com/crossfilter). It is much, much faster than the native sort. I would guess that WebKit uses this slower algorithm because it's an easy way to implement stable sort. If I recall, Chrome was one of the earlier browsers to implement a non-stable sorting algorithm for JavaScript and it broke various pages that assumed stable sort (such as tables that let you sort by multiple columns).

Fun side-note: you can use a matrix diagram to visualize how your browser sorts, letting you determine your browser's sort algorithm visually. For example, Chrome uses median-of-3 quicksort, and Firefox uses mergesort. See for yourself!

http://bost.ocks.org/mike/shuffle/compare.html

http://www.flickr.com/photos/mbostock/sets/72157628971703067...


Can you point me at some code comparing the performance of your JavaScript quicksort implementation to the browser's native sort? I threw together a quick comparison myself and couldn't see any cases where the JavaScript implementation was significantly faster than JavaScriptCore's Array.prototype.sort, and there were a few cases where Array.prototype.sort was significantly faster than the JS implementation.

EDIT: One interesting case is that of sparse arrays (e.g., the arrays which trigger JavaScriptCore to fall down the slow path that prompted this discussion). It seems that your JavaScript quicksort implementation doesn't handle these correctly: it gives different results from Array.prototype.sort and is dramatically slower as well. I'm guessing the performance impact comes from the fact that from JavaScript's point of view the holes in the array are identical undefined values. I'm not sure why correctness would be affected though.


I don't know much JS but shouldn't there be a quicksort library function for that?



> // "Min" sort. Not the fastest, but definitely less code than heapsort // or quicksort, and much less swapping than bubblesort/insertionsort.

Oh, OK. Glad you could save yourself some time/LOC.


While you're correct to criticize to a degree, there is also something to be said for reducing complexity and preventing bugs by avoiding unnecessarily long algorithms.


Interesting. Perhaps a quicksort/heapsort/mergesort should be written in JS instead and used by webkit for large collections. It'd be a lot less code than in C++, that's for sure.


That would make performance worse, since webkit already uses what looks like standard library quicksort or mergesort, which I'd count at 0 lines of code. (Although they also use an AVL tree for sorting some things, I'm not sure why, or why the AVL tree code lives in the "wtf" source directory. It's all in http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/r... from line 1409 onwards)


While I doubt a javascript-space sorting implementation runs any faster than a C++ implementation could, there are some implementations around, http://millermedeiros.github.com/amd-utils/array.html#sort for example.


JavaScript sorting algorithms can definitely be faster if you are passing a comparator function to Array.prototype.sort, depending on the JS engine. See https://groups.google.com/forum/#!msg/mozilla.dev.tech.js-en..., where Boris Zbarsky shows that translating quick sort into JavaScript produces an algorithm that runs 7X faster than Array.prototype.sort(function(a, b) { return a-b; }) in SpiderMonkey (which uses an insertion + merge sort in C++ for Array.prototype.sort) because of the expense of making calls from C++ to JS. Chrome's sort algorithm is written in JavaScript so it doesn't suffer from this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: