> Thread-local is way too magical for me. I wouldn't want to debug a system that...

cesarb · 2024-11-21T14:07:36 1732198056

> > Thread-local is way too magical for me.

> There's a perfectly cromulent register just begging to be used; [...] what magic are you afraid of here?

Most of the magic is not when using the thread-local variable, but when allocating it. When you declare a "static __thread char *p", how do you know that for instance this is located at the 123th word of the per-thread area? What if that declaration is on a dynamic library, which was loaded late (dlopen) into the process? What about threads which were started before that dynamic library was loaded, and therefore did not have enough space in their per-thread area for that thread-local variable, when they call into code which references it? What happens if the thread-local variable has an initializer?

The documentation at https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html links to a 81-page document describing four TLS access models, and that's just for Unix-style ELF; Windows platforms have their own complexities (which IIRC includes a per-process maximum of 64 or 1088 TLS slots, with slots above the first 64 being handled in a slightly different way).

AshamedCaptain · 2024-11-21T22:16:47 1732227407

When you declare a `static char *p;', how do you even know in which address of memory it is going to end up ?? How do you know what will happen if another compilation unit declares another variable of the same name? Another static library? Another dynamic library? What about initialization, what about other constructors that may read memory before main() runs? What about injected threads that are started before that? Madness, I tell you, absolute and utter madness.

maccard · 2024-11-21T15:35:55 1732203355

The initialisation model in c++ is totally and utterly broken and indecipherable. That doesn’t stop me from doing vector<int> foo = {1,2, 3};

intelVISA · 2024-11-21T16:08:36 1732205316

Avoiding thread locals due to dynamic libraries being bad is justified but still doesn't feel like the right tradeoff.

geocar · 2024-11-22T14:19:36 1732285176

> When you declare a "static __thread char p", how do you know that for instance this is located at the 123th word of the per-thread area?

Believe it or not, it's exactly the same way it knows that "static char p" is located at the 123rd word of the data section: The linker does it!

> What if that declaration is on a dynamic library, which was loaded late (dlopen) into the process?

Same thing, except the linker is dynamic.

> What about threads which were started before that dynamic library was loaded, and therefore did not have enough space in their per-thread area for that thread-local variable, when they call into code which references it? What happens if the thread-local variable has an initializer?

Seriously? I mean, you probably do have enough space because address space is huge, but dlopen should move TLS.

> links to a 81-page document describing four TLS access models, and that's just for Unix-style ELF

Just because someone can write 81-pages about something doesn't mean that it takes 81-pages to understand something.

mrkeen · 2024-11-22T12:05:50 1732277150

Different people have different appetites for magic (and different definitions of what magic is).

For my first magic trick, I'd like to make something not equal to itself:

  foo("bar") == foo("bar")
  > false // Magic!

This is easy enough. You can make foo(..) do different things by giving it an implicit dependency on the state of the world. I'll notate implicit dependencies with [].

  foo[world1]("bar") == foo[world2]("bar")
  > false // Perfectly reasonable behaviour

Where does this pop up "in the real world"? Devs always hit the "what time is it" problem [1]

  assertEquals ( "12:23", foo[time1]("bar") )
  > Fail!

If you like magic, there's at least two fixes in Java land. 1: You can use some kind of mocking-framework magic to override what your actual clock does (in a way which is not visible in the non-test source code). 2: You can inject a fake clock service into your bean-wiring-framework so that you can control the clock during testing. Others seem to like these two, but they make me barf.

The way I fix it is to just make the implicit explicit instead, by turning them into parameters.

  assertEquals ( "12:23", foo[](time1, "bar") )
  > Pass!

Not only is it the simplest (subjectively), the technique works in any language.

I feel like thread-local takes this magic to the next level:

  assertEquals ( three(), one() + two() )

is really:

  assertEquals ( three[thread m](), one[thread n]() + two[thread o]() )

[1] https://stackoverflow.com/questions/2425721/unit-testing-dat...

mrkeen · 2024-11-23T12:06:20 1732363580

> Magic just means "I don't understand this"

Does this mean I don't understand that objects can have private mutable variables, or that I don't understand random() or getCurrentTime() ?

> The thread "o" is going to be the same in each expression

I don't accept the premise that I'll always be on the same thread, and if I did, I wouldn't need thread-local in the first place.

geocar · 2024-11-22T13:59:26 1732283966

> Different people have different appetites for magic (and different definitions of what magic is).

Magic just means "I don't understand this"

And whilst I don't think it's that complicated, I also don't think you really need to understand how TLS works to use it any more than you need to understand auto mechanics to drive a car. It's faster than walking.

> I feel like thread-local takes this magic to the next level:

> assertEquals ( three(), one() + two() )

> is really:

> assertEquals ( three[thread m](), one[thread n]() + two[thread o]() )

So I think it's more like:

  assertEquals ( thread[o][three](), thread[o][one]() + thread[o][two]() )

versus:

  assertEquals ( data[three](), data[one]() + data[two]() )

That is to say:

- TLS is indexed off of a base address, just like data.

- The thread "o" is going to be the same in each expression, so if thread[o] can live in a special-purpose register like %fs then it won't take a general-purpose register. If data can live in a special-purpose register...

Perhaps with a better picture you too will consider TLS is basically the same as non-TLS, except it's sometimes faster because, well, special circuitry.

If anything accessing via "data" is the weird one because even though it looks like you could just set data=0, you'd find almost everyone does it indirectly via %rip, and a variable "foo" is going to be %rip(-5000) in one place and %rip(-4000) in another, but would always be %fs:-26(0) if it were TLS.