A GCed piece of nontrivial code can also be faster than a malloc/free solution. ...

YorickPeterse · on Dec 1, 2019

> increment of a pointer (and that is atomic, so no thread locking needed).

Nitpick (and mostly for others reading): incrementing the address of a pointer if done locally (as in, using a local variable of sorts) may be atomic, but storing the updated pointer somewhere may not. If a bump allocator wants to support concurrent allocations, it should make sure to use the right atomic operations (typically this just a compare-and-swap of the old pointer with the locally incremented one).

jashmatthews · on Nov 30, 2019

Is this universally true? Redis for example vendors jemalloc so it’s entirely possible for malloc to mostly inline? IIUC malloc isn’t a syscall like the underlying sbrk and mmap calls that malloc implementations use to get memory from the kernel?

cracauer · on Dec 1, 2019

Sure, you can change any of the individual properties.

But if you cannot move memory (adjust pointers like most GCs do) then you will have to deal with fragmentation, which slows down allocation (or causes other drawbacks).

Moving GC can also make the program faster due to memory compactation and hence being more efficient wrt CPU cache and TLB.

vidarh · on Dec 1, 2019

Fragmentation issues depends greatly on language. E.g I'm (perpetually, it seems) working on a Ruby compiler, and I slotted in a naive gc as a starting point and instrumented it.

Turned out the vast majority of allocations are tiny object instances of a handful of different sizes. As a result minimizing fragmentation is as easy as allocating pools of a fixed size for small objects, and round up larger objects to a multiple of block size. There are still truly pathological patterns possible, and a compacting / generational gc may still be worth it later both to deal with that and to reduce the cost of a collection, but for a lot of uses you can get away with simple options like that.

umanwizard · on Dec 1, 2019

Malloc is not a syscall, you're right. But it's a quite complicated funtion, so it wouldn't make sense for the compiler (or linker) to inline it wherever it's called.

jashmatthews · on Dec 1, 2019

jemalloc is deliberately structured so that the fast path is all inlined and all unlikely paths which would cause it to blow the inlining complexity budget get pushed out.

umanwizard · on Dec 1, 2019

Interesting. I didn't know that, but it sounds plausible.