Libpas beats other mallocs in WebKit. I also had benchmarks involving non-WebKit workloads and the results were all over the place (sometimes slower than other mallocs by a lot, sometimes faster by a lot - same thing with memory, sometimes more efficient, sometimes less). This didn't surprise me; I've seen this before when writing memory management code.
I think it makes sense for large software projects that use malloc a lot and care about perf to eventually either write their own malloc or to take an existing one and then tune it a lot.
We should have libmetamalloc that tracks a history of program invocations using the actual workload on the actual machine. Cycle through different malloc implementations for each execve(). After gathering enough statistical data select the optimal implementation. The next step would be a basic ML model that looked at a few variables like time of day, args, etc to determine when to switch allocators.
If an OS used such a thing by default it would figure out that it should use libpas on the programs that were faster in your tests. Since most programs have zero effort put into optimizing allocators (or much of anything else) it would likely be a win even given the complexity. Many things are branch predictors if you squint!
Note: Not even I can tell if I'm joking or serious with this comment.
Libpas beats other mallocs in WebKit. I also had benchmarks involving non-WebKit workloads and the results were all over the place (sometimes slower than other mallocs by a lot, sometimes faster by a lot - same thing with memory, sometimes more efficient, sometimes less). This didn't surprise me; I've seen this before when writing memory management code.
I think it makes sense for large software projects that use malloc a lot and care about perf to eventually either write their own malloc or to take an existing one and then tune it a lot.