Glibc falls back to quicksort from merge sort (its default algorithm) when there's to little free memory available to accommodate merge sort's O(n) requirement. Quicksort only uses O(log n) additional memory, making it a good option in that sense. O(1) would of course be better, but even for huge inputs that logarithm makes the memory consumption a non-issue in most cases. Heapsort does give you O(1) memory, but also comes with poorer cache locality and overall performance than quicksort, making it a not so compelling option.
Implementations are available in the repository linked to from the article. Not quite "ready for production", but hopefully a useful starting point. https://github.com/matslina/bfoptimization
Since bff4lnr is an interpreter, this really becomes apples and oranges. But for fun, dbfi.b on bff4lnr (at -O3) vs all at -O0 and -O3: http://imgur.com/uEEgVr0
Conclusion: if you want speed then compile.
Mergesort would be a poor choice for qsort() due to the linear space complexity. With N bytes of RAM, you'd only be able to sort (a bit less than) N/2 bytes of data. An in-place algorithm is preferable.
Glibc is the only libc I'm aware of that implements mergesort. It still falls back to quicksort for large inputs though.