The man page explains several of the differences (http://www.canonware.com/download/jemalloc/jemalloc-latest/d...). These include a different API (you can pass in some flags that explicitly direct it how to do alignment, initialization, etc), reduced use of sbrk in favor of mmap, multiple arenas for allocations of different sizes, per-cpu arenas to eliminate lock contention, etc.
Mozilla has been using jemalloc since the release of Firefox 3 in 2008, primarily because having multiple arenas for allocations of different sizes means that you end up with less fragmentation over time. Fragmentation happens when you deallocate ten 100-byte objects that are scattered around in memory, then need to allocate one thousand-byte object. You can't reuse those 10-byte empty spots in memory, because that 1000-byte allocation has to be contiguous. This means that your total memory usage has to grow. Jemalloc puts small allocations into buckets where they'll be closer together, making it more likely that new allocations will be able to fill the holes left by deallocations, and that freeing small objects will eventually empty a memory page, allowing you to return the page to the OS.
I remember that we were all pretty chuffed at the improvements in Firefox's memory usage; it was quite significant and would surely reduce the general perception that Firefox was a memory hog. Just using jemalloc reduced memory usage by 22% (http://blog.pavlov.net/2008/03/11/firefox-3-memory-usage/ has a really amazing graph comparing FF2 and FF3 to IE7).
Rust uses jemalloc for the same reasons; there's just no technical reason not to use it.
no - its not the job of the compiler to choose your memory allocator. There is talk, however, of replacing the current glibc allocator with jemalloc[1], which is the more appropriate place for that decision to be made.
To me, it seems like the most sensible thing would be to make the choice of memory allocator a global OS configuration parameter—like enabling DEP on Windows. It'd still only be the program (or language runtime) deciding for itself whether to obey the allocator suggestion—but having one system-level place for this kind of policy information to be stored, where glibc et al could look for it, would make a lot more sense.
No thanks; that'd be worse. Then you'd have to test every release of your program with every allocator in combination with every other configuration change that could be made.
Besides, different applications work best with different allocators so it's definitely a choice the author of a program needs to make.
Probably the fact that it eagerly coalesces newly freed blocks where high performance allocators like tcmalloc and jemalloc use slabs of fixed-sized blocks which are never coalesced. Coalescing will load adjacent blocks into cache, and it probably implies doubly linked lists where slab allocators can get away with cache-friendlier single links or bitmaps etc. (Because you probably have to remove a block from the middle of a free list if you've just coalesced it with a newly freed block). It turns out cache footprint is key for speed and scalability in allocators, so that's a big problem.
I'm embarassed to link this twice in a row now, but at one point I played around with the perf of a couple allocators, including glibc's ptmalloc. I wound up answering exactly this question, at least for some simple use cases:
http://www.andrew.cmu.edu/user/apodolsk/418/finalreport.html.
This stuff is out of the scope of my knowledge, but I do know libraries such as OpenBSD's memory allocator are slower due to security optimizations. jemalloc is not known for it's security, instead it seems to be focused on performance. Security features such as ASLR, not leaving old memory around, zeroing out memory on release, etc, are not things that improve an allocators performance.
Good point. I wonder if Rust's allocator needs fewer security features because of heavy compile-time heap safety checking. For example, I don't believe use-after-free or double-free is possible in Rust due to Rust's lifetime system (barring use of unsafe code blocks).
I'm not sure the point about ASLR is entirely correct; ASLR does randomize the addresses returned by mmap (slightly), and jemalloc primarily uses mmap to allocate its arenas.
No good reason for why it hasn't changed, except that the underlying single-threaded allocator (Doug Lea's malloc) is really good. The multithreaded version, which is a "per-thread" (not really) wrapper around the Lea allocator has serious problems that have been known for a long time. I wrote a paper outlining and demonstrating the problems in 2000. The inertia is difficult to explain.
the glibc allocator is based off of Doug Lea's malloc that was developed before multithreading was common. Optimizations for multiple threads + arenas were retrofitted on top of the existing code, whereas jemalloc was designed more recently from scratch for multi-threaded workloads, and was able to take advantage of recent research.