Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A GCed piece of nontrivial code can also be faster than a malloc/free solution.

The reason being that allocation can be much faster. The GCed code can allocate memory with as much as an increment of a pointer (and that is atomic, so no thread locking needed).

malloc/free always do full function calls, and they might/will descent into dealing with fragmentation (aka finding free space). Likewise, free() isn't free and cross-thread allocation/deallocation can further complicate things.



> increment of a pointer (and that is atomic, so no thread locking needed).

Nitpick (and mostly for others reading): incrementing the address of a pointer if done locally (as in, using a local variable of sorts) may be atomic, but storing the updated pointer somewhere may not. If a bump allocator wants to support concurrent allocations, it should make sure to use the right atomic operations (typically this just a compare-and-swap of the old pointer with the locally incremented one).


Is this universally true? Redis for example vendors jemalloc so it’s entirely possible for malloc to mostly inline? IIUC malloc isn’t a syscall like the underlying sbrk and mmap calls that malloc implementations use to get memory from the kernel?


Sure, you can change any of the individual properties.

But if you cannot move memory (adjust pointers like most GCs do) then you will have to deal with fragmentation, which slows down allocation (or causes other drawbacks).

Moving GC can also make the program faster due to memory compactation and hence being more efficient wrt CPU cache and TLB.


Fragmentation issues depends greatly on language. E.g I'm (perpetually, it seems) working on a Ruby compiler, and I slotted in a naive gc as a starting point and instrumented it.

Turned out the vast majority of allocations are tiny object instances of a handful of different sizes. As a result minimizing fragmentation is as easy as allocating pools of a fixed size for small objects, and round up larger objects to a multiple of block size. There are still truly pathological patterns possible, and a compacting / generational gc may still be worth it later both to deal with that and to reduce the cost of a collection, but for a lot of uses you can get away with simple options like that.


Malloc is not a syscall, you're right. But it's a quite complicated funtion, so it wouldn't make sense for the compiler (or linker) to inline it wherever it's called.


jemalloc is deliberately structured so that the fast path is all inlined and all unlikely paths which would cause it to blow the inlining complexity budget get pushed out.


Interesting. I didn't know that, but it sounds plausible.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: