Coz: Causal Profiling

efferifick · on April 20, 2024

I've been a big fan of Emery's research. Coz is one tool that I am always wanting to use, but I haven't had the chance to do so.

Check his other research. Some of it is highly accessible via youtube videos. I recommend watching / reading:

  * Stabilizer
  * Mesh
  * Scalene

jcalabro · on April 20, 2024

There's a great talk that explains how/why this works for those looking for more info.

Also, Go UMass!

https://www.youtube.com/watch?v=r-TLSBdHe1A

junon · on April 20, 2024

I've tried a number of times to integrate Coz into projects and the developer experience for it is pretty bad. I think of all the times I've tried it it linked correctly maybe once, and even then didn't really give me any data.

I really want to love this research because having read into it I think it's really valuable, but not ever getting it working is so discouraging.

brunoqc · on April 20, 2024

I remember seeing the video 8 years ago but I never heard about it again.

I was wondering if the project was stuck or just a PoC.

I wonder if it evolved and how useful it has been since.

nestorD · on April 20, 2024

I have seen Rust people use it, it has a library:

https://crates.io/crates/coz

IshKebab · on April 20, 2024

I looked into this years ago but it was very complex to set up, and I believe it only really has benefit if you have a lot of threads waiting for a long time for other threads, which is pretty rare in my experience.

Neat idea though.

j-pb · on April 20, 2024

Complex?

You include the library, put some "progress" markers into your benchmarks, and run them.

It took me minutes in both Rust and Zig to set up.

I've seen huge improvements with it for a single threaded embedded database I'm building.

The only downside is that it doesn't track some things like memcpy well so you also need a flamegraph.

vlovich123 · on April 21, 2024

I've never had success getting it to run either:

    [profiler.cpp:75] Starting profiler thread
    [libcoz.cpp:96] init_coz in progress, do not recurse
    [profiler.h:123] Thread state not found
    Aborted!
      0: /usr/bin/../lib64/coz-profiler/libcoz.so(_ZN8profiler8on_errorEiP9siginfo_tPv+0x6c) [0x7a084dbb274c]
      1: /usr/lib/libc.so.6(+0x3c770) [0x7a084d8ac770]
      2: /usr/lib/libc.so.6(+0x8d32c) [0x7a084d8fd32c]
      3: /usr/lib/libc.so.6(gsignal+0x18) [0x7a084d8ac6c8]
      4: /usr/lib/libc.so.6(abort+0xd7) [0x7a084d8944b8]
      5: /usr/bin/../lib64/coz-profiler/libcoz.so(pthread_create+0x18c) [0x7a084dbaf8ec]
      6: /usr/bin/../lib64/coz-profiler/libcoz.so(_ZN8profiler7startupERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEP4lineib+0x208) [0x7a084dbb2fd8]
      7: /usr/bin/../lib64/coz-profiler/libcoz.so(_Z8init_cozv+0xf95) [0x7a084dbaeb25]
      8: /usr/bin/../lib64/coz-profiler/libcoz.so(+0x1a733) [0x7a084dbaf733]
      9: /usr/lib/libc.so.6(+0x25cd0) [0x7a084d895cd0]
      10: /usr/lib/libc.so.6(__libc_start_main+0x8a) [0x7a084d895d8a]
      11: target/release/mybin(+0x7b155) [0x556484976155]

I think the concept is interesting, but I haven't been able to actually make it work so it remains a theoretically interesting approach.

j-pb · on April 21, 2024

What system are you running on? Currently only Linux is supported. I've been running it on Ubuntu LTS.

vlovich123 · on April 21, 2024

Obviously from the paths it's Linux. Running on Arch.

Looks to be an open issue FWIW: https://github.com/plasma-umass/coz/issues/180

The project also sees very little meaningful maintenance activity which leads me to believe it failed to get traction for whatever reason / the authors aren't really focused on it (probably grants ran out & they moved onto other things).

j-pb · on April 21, 2024

Could've been any posix-like tbh.

Yeah the code def needs some love, but itʼs pretty decent given that it'a academic.

IshKebab · on April 20, 2024

Yes it was very complex. Maybe it has improved since I tried it.

Compare to running with a traditional profiler which only requires setting an environment variable (for gperftools at least which is my preference). I don't even need to recompile my program.

> I've seen huge improvements with it for a single threaded embedded database I'm building.

How would this make any difference for a single threaded program?

j-pb · on April 20, 2024

> How would this make any difference for a single threaded program?

Coz can slow down individual sections, not only threads. This will show you how your program behaves with specific lines sped up regardless of threads. A modern computer is also a distributed system, so it has the same counter-intuitive performance behaviours, even when just looking at the memory hierarchy.

IshKebab · on April 21, 2024

That still doesn't entirely make sense. If your program is single threaded then whenever your code is running, the program is waiting for your code and speeding it up will speed up the whole program by the same amount.

If you are never doing concurrent work then there will never be any code where a speed up doesn't affect total runtime because it is waiting for some other work. As I understand it that's the gotcha that Coz fixes, but it doesn't apply in single threaded code.

But I have never got it to work so maybe I'm missing something.

Maybe it's just easier t interpret. Sampling profilers can be a bit confusing, and debug info doesn't always work as neatly as you'd like.

j-pb · on April 21, 2024

You are ignoring that even single threaded code has a lot of sources of concurrency and side-effects; memory/caches, branch prediction, prefetching...

Watch the talk and look at the examples. Most of them are single-threaded; a bad hash-function causing bucket collisions and linear inserts; SQLite using an indirection table, killing speculative execution and code prefetching.

Those two wouldn't really show up in a sampling profiler, because they still take up a tiny amount of time.

Sampling profilers show you where time is spend, Causal profilers show you what performance side effects every line of code has.

IshKebab · on April 21, 2024

I have literally discovered and fixed the "bad hash function" case at work using gperftools. I'll watch the talk again anyway.

Ygg2 · on April 21, 2024

Even if it's single thread a program can be concurrent and waiting on same resources.