The project also sees very little meaningful maintenance activity which leads me to believe it failed to get traction for whatever reason / the authors aren't really focused on it (probably grants ran out & they moved onto other things).
Yes it was very complex. Maybe it has improved since I tried it.
Compare to running with a traditional profiler which only requires setting an environment variable (for gperftools at least which is my preference). I don't even need to recompile my program.
> I've seen huge improvements with it for a single threaded embedded database I'm building.
How would this make any difference for a single threaded program?
> How would this make any difference for a single threaded program?
Coz can slow down individual sections, not only threads. This will show you how your program behaves with specific lines sped up regardless of threads. A modern computer is also a distributed system, so it has the same counter-intuitive performance behaviours, even when just looking at the memory hierarchy.
That still doesn't entirely make sense. If your program is single threaded then whenever your code is running, the program is waiting for your code and speeding it up will speed up the whole program by the same amount.
If you are never doing concurrent work then there will never be any code where a speed up doesn't affect total runtime because it is waiting for some other work. As I understand it that's the gotcha that Coz fixes, but it doesn't apply in single threaded code.
But I have never got it to work so maybe I'm missing something.
Maybe it's just easier t interpret. Sampling profilers can be a bit confusing, and debug info doesn't always work as neatly as you'd like.
You are ignoring that even single threaded code has a lot of sources of concurrency and side-effects; memory/caches, branch prediction, prefetching...
Watch the talk and look at the examples. Most of them are single-threaded; a bad hash-function causing bucket collisions and linear inserts; SQLite using an indirection table, killing speculative execution and code prefetching.
Those two wouldn't really show up in a sampling profiler, because they still take up a tiny amount of time.
Sampling profilers show you where time is spend, Causal profilers show you what performance side effects every line of code has.
You include the library, put some "progress" markers into your benchmarks, and run them.
It took me minutes in both Rust and Zig to set up.
I've seen huge improvements with it for a single threaded embedded database I'm building.
The only downside is that it doesn't track some things like memcpy well so you also need a flamegraph.