Cuckoo filters outperform bloom filters and allow dynamic insertion and deletion (unlike bloom filters, which only allow insertion). The trade off is that insertion can fail if the table is too full and then would need to expand or store those entries some other way to avoid a false negative.
The problem in this conversation is that you are equivocating between "fixing memory safety bugs" and "preventing memory safety bugs statically." When this blog post refers to "memory safety skeptics," it refers to people who think the second is not a good way to expend engineering resources, not your imagined flagrantly irresponsible engineer who is satisfied to deliver a known nonfunctional product.
Deal with it. Async is my greatest disappointment in the otherwise mostly stellar language. And I will continue to argue strongly against it.
After Rust has raised the level of quality and expectations to such great level, async feels like 3 steps back with all those arguments "you are holding it wrong", footguns, and piles of hacks. And this sentiment is shared by many others. It's really disappointing to see how many resources are getting sunk into the flawed async model by both the language and the ecosystem developers.
>go build it then
I did build it and it's in the process of being adopted into a proprietary database (theoretically a prime use-case for async Rust). Sadly, because I don't have ways to change the language and the compiler, it has obvious limitations (and generally it can be called unsound, especially around thread locals). It works for our project only because we have a tightly controlled code base. In future I plan to create a custom "green-thread" fork of `std` to ease limitations a bit. Because of the limitations (and the proprietary nature of the project) it is unlikely to be published as an open source project.
Amusingly, during online discussions I've seen other unrelated people who done similar stuff.
> it has obvious limitations (and generally it can be called unsound, especially around thread locals)
Is this really better than what we have now? I don't think async is perfect, but I can see what tradeoffs they are currently making and how they plan to address most if not all of them. "General" unsoundness seems like a rather large downside.
> In future I plan to create a custom "green-thread" fork of `std` to ease limitations a bit
Can you go more in-depth into these limitations and which would be alleviated by having first class support for your approach in the compiler/std?
Depends on the metric you use. Memory-wise it's a bit less efficient (our tasks usually are quite big, so relative overhead is small in our case), runtime-wise it should be on par or slightly ahead. From the source code perspective, in my opinion, it's much better. We don't have the async/await noise everywhere and after development of the `std` fork we will get async in most dependencies as well for "free" (we still would need to inspect the code to see that they do not use blocking `libc` calls for example). I always found it amusing that people use "sync" `log`-based logging in their async projects, we will not have this problem. The approach also allows migration of tasks across cores even if you keep `Rc` across yield points. And of course we do not need to duplicate traits with their async counterparts and Drop implementations with async operations work properly out of the box.
>Can you go more in-depth into these limitations and which would be alleviated by having first class support for your approach in the compiler/std?
The most obvious example is thread locals. Right now we have to ensure that code does not wait on completion while having a thread local reference (we allow migration of tasks across workers/cores by default). We ban use of thread locals in our code and assume that dependencies are unable to yield into our executor. With forked `std` we can replace the `thread_local!` macro with a task-local implementation which would resolve this issue.
Another source of potential unsoundness is reuse of parent task stack for sub-task stacks in our implementation of `select!`/`join!` (we have separate variants which allocate full stacks for sub-tasks which are used for "fat" sub-tasks). Right now we have to provide stack size for sub-tasks manually and check that the value is correct using external tools (we use raw syscalls for interacting with io-uring and forbid external shared library calls inside sub-tasks). This could be resolved with the aforementioned special async ABI and tracking of maximum stack usage bound.
Finally, our implementation may not work out-of-box on Windows (I read that it has protections against messing with stack pointer on which we rely), but it's not a problem for us since we target only modern Linux.
If you use a custom libc and dynamic linker, you can very easily customize thread locals to work the way you want without forking the standard library.
Initial comments as I write this are all negative, and also responding to something the blog post didn't claim. The only time it says faster than C is when talking about a language targeting the GPU; it is not controversial that GPUs perform better than CPUs for many workloads.
On the other hand, the results of this true fact are already seen in practice in real systems. Rust's iterator adapters, for example, allow high level "functional style" with closures to be compiled to code that is exactly the same as hand rolled C loops without unamortized bounds checks - despite guaranteeing the absence of out of bound accesses even in the face of programmer error - because the constraints of these interfaces and the design of the language enable tons of optimizations by LLVM. This has been true since Rust 1.0 more than a decade ago, it's not a promise of the future.
The occurrence of data races depends on the specific non-deterministic sequence of execution of concurrent codepaths. Just because you have 100% code coverage does not mean you've covered every potential execution sequence, and its almost never practical to actually execute every possibility to ensure the absence of data races. Depending on the probability that your data race will occur, it could indeed be something you have to make stars align for TSAN to catch.
Not to talk my own book, but there is a well-known alternative to C++ that can actually guarantee the absence of data races.
It "could" for some algorithms, yes, but for a lot of algorithms, that kind of star alignment simply isn't necessary to find all the data races, was my point. And yes, TLA+ etc. can be helpful, but then you have the problem of matching them up with the code.
I feel like in a subtle way you're mixing up data races with race conditions, especially given the example you site about incrementing an atomic variable.
TSAN does not check for race conditions in general, and doesn't claim to do so at all as the documentation doesn't include the term race condition anywhere. TSAN is strictly for checking data races and deadlocks.
Consequently this claim is false:
>The issue is that even if it statically proved the absence of data races in the C++ sense, that still wouldn't imply that your algorithm is race-free.
Race-free code means absence of data races, it does not mean absence of the more general race condition. If you search Google Scholar for race free programming you'll find no one uses the term race-free to refer to complete absence of race conditions but rather to the absence of data races.
There's "data race" in "C++ ISO standard" sense, and then there's "data race" in the general CS literature (as well as all the other terms). Two threads writing a value to the same memory location (even atomically) is usually a data race in the CS/algorithm sense (due to the lack of synchronization), but not the C++ sense. I'm not really interested in pedantic terminology here, just trying get a higher level point across about what you can & can't assume with a clean TSAN (and how not to clean your TSAN errors). Feel free to mentally rewrite my comment with whatever preferred terminology you feel would get my points across.
This isn't pedantry, if you're going to talk about how specific tools work then you need to use the actual terminology that the tools themselves use or else you will confuse yourself and anyone you talk to about them. If we were discussing general concepts about thread safety then sure we can be loose about our words, but if we're talking about a specific tool used for a specific programming language then we should make sure we are using the correct terminology, if only to signal that we have the proper domain knowledge to speak about the subject.
>Feel free to mentally rewrite my comment with whatever preferred terminology you feel would get my points across.
If I rewrite your comment to use data race, then your comment is plainly incorrect since the supporting example you give is not a data race but a race condition.
If I rewrite your comment to use race condition, then your comment is also incorrect since TSAN doesn't detect race conditions in general and doesn't claim to, it detects data races.
So what exactly am I supposed to do in order to make sense of your post?
The idea that you'd talk about the pros and cons of a tool like TSAN without knowing the difference between a race condition and a data race is kind of absurd. That you'd claim my clarification of these terms for the sake of better understanding your point is a form of pedantry is sheer hubris.
Hold on, before attacking me. Say we have this Java program, and assume the semantics of the common JRE/JVM everyone uses. Do you believe it has a data race or not? Because the variable is accessed atomically, whether whether you mark it as volatile or not:
class Main {
private static String s = "uninitialized";
public static void main(String[] args) {
Thread t = new Thread() {
public void run() { s = args[0]; }
};
t.start();
System.out.println(s);
}
}
And I sure as heck have not heard anyone claim such data races are impossible in Java. (Have you?)
>When a program contains two conflicting accesses (§17.4.1) that are not ordered by a happens-before relationship, it is said to contain a data race.
Yes, your program contains a data race, by the definition used in the JLS. The set of outcomes you may observe from a data race are specified. I'm not sure if this choice was intentional or not, but there is a guarantee that you will either print the argument or "uninitialized" and no other behavior, because String relies on final field semantics. This is would not be true in c/c++ where the equivalent code is undefined behavior and you could see any result.
In Java you can have a data race and use it productively for certain niche cases, like String.hashcode - I've also contributed some to the Guava concurrency library. This is not true in c/c++ where data races (by their definition) are undefined behavior. If you want to do the tricks you can in racy Java without UB, you have to declare your variables atomic and use relaxed memory order and possibly fences.
What you wrote is indeed a data race, it'll race s, but you mention the semantics of the JRE and I wonder if you actually know what those are because that's crucial here.
You see Java has a specific memory ordering model (many languages just give you a big shrug, including C before it adopted the C++ 11 model but Java spells out what happens) and that model is very sophisticated so it has an answer to what happens here.
Because we raced s, we lose Sequential Consistency. This means in general (this example is so trivial it won't matter) humans struggle to understand what's going on in their program, which makes debugging and other software engineering impractical. But, unlike C++ loss of Sequential Consistency isn't fatal in Java, instead we're promised that when s is observed in the main thread it will either be that initial "uninitialized" string or it will have the args[0] value, ie the name of the program because these are the only two values it could have and Java does not specify which of them observed in this case.
You could think of this as "atomic access" and that's likely the actual implementation in this case, but the Java specification only promises what I wrote.
In C++ this is game over, the language standard specifically says it is Undefined Behaviour to have any data race and so the behaviour of your program is outside the standard - anything at all might happen.
[Edited: I neglected originally to observe that s is set to "uninitialized", and instead I assumed it begins as null]
> But, unlike C++ loss of Sequential Consistency isn't fatal in Java
I have no idea what you mean here. Loss of sequential consistency is in no way fatal in C++. There are several access modes that are specifically designed to avoid sequential consistency.
Regarding the rest of your comment:
You're making exactly my point though. These are guaranteed atomic accesses -- and like you said, we are guaranteed to see either the old or new value, and nothing else -- and yet they are still data races. Anyone who agrees this is a data race despite the atomicity must necessarily understand that atomics don't imply lack of data races -- not in general CS terminology.
The only way it's correct to say they are mutually exclusive is when you define "data race" as they did in the C++ standard, to imply a non-atomic access. Which you're welcome to do, but it's an incredibly pedantic thing to do because, for probably >95% of the users of C++ (and probably even of TSAN itself), when they read "data race", they assume it to mean the concept they understand from CS. They don't know that the ISO standard defines it in its own peculiar way. My point here was to convey something to normal people rather than C++ committee language lawyers, hence the use of the general term.
> Loss of sequential consistency is in no way fatal in C++. There are several access modes that are specifically designed to avoid sequential consistency.
Sure, if you work really hard you can write a C++ program which doesn't meet the 6.9.2.2 intro.races definition of a data race but does nevertheless lose sequential consistency and so it has in some sense well-defined meaning in C++ but humans can't usefully reason about it. You'll almost certainly trip and write UB when attempting this, but assuming you're inhumanly careful or the program is otherwise very simple so that you don't do that it's an exception.
My guess is that your program will be miscompiled by all extant C++ compilers and you'll be unhappy with the results, and further that if you can get committee focus on this at all they will prioritize making your program Undefined in C++ rather than "fixing" compilers.
Just don't do this. The reason for the exclusions in 6.9.2.2 is that what we want people to do is write correct SC code but using primitives which themselves can't guarantee that, so the person writing the code must do so correctly. The reason is not that somehow C++ programmers are magicians and the loss of SC won't negatively impact the correctness of code they attempt to write, quite the contrary.
>The only way it's correct to say they are mutually exclusive is when you define "data race" as they did in the C++ standard, to imply a non-atomic access.
A data-race does not imply a non-atomic operation, it implies an unsynchronized operation. Different languages have different requirements for what constitutes a synchronized operation, for example in Python all reads and writes are synchronized, in Java synchronization is generally accomplished through the use of a monitor or a volatile operation, and in C++ synchronization is generally accomplished through the use of std::atomic operations.
The fact that in C++ atomic operations result in synchronization, while in Java atomic operations are not sufficient for synchronization is not some kind of language lawyering or pedantry, it's because Java makes stronger guarantees about the consequences of a data race versus C++ where a data race can result in any arbitrary behavior whatsoever. As such it should not come as a surprise that the cost of those stronger guarantees is that Java has stronger requirements for data race free programs.
But of course this is mostly a deflection, since the discussion was about the use of TSAN, which is a data race detector for C and C++, not for Java. Hence to the extent that TSAN detects data races, it detects them with respect to C and C++'s memory model, not Java's memory model or Python's memory model, or any other memory model.
The objection I originally laid out was your example of a race condition, an example which can happen even in the absence of parallelism (ie. a single-core CPU) and even in the absence of multithreaded code altogether (your example can happen in a single threaded application with the use of coroutines). TSAN makes no claim with regards to detecting race conditions in general, it only seeks to detect data races and data races as they pertain the C and C++ memory models.
I am not "deflecting" anything, I am going to the heart of the matter.
Let me lay this out very explicitly. This comment will likely be my final one on the matter as this back-and-forth is getting quite tiresome, and I'm not enjoying it, especially with the swipes directed at me.
For the sake of this discussion, assume the command-line arguments behave the same in both languages. (i.e. ignore the C++ argv[0] vs. Java args[0] distinction and whatnot.)
Somehow, you simultaneously believe (a) that Java program contains data races, while believing (b) the C++ program does not.
This is a self-contradictory position. The programs are well-defined, direct translations of each other. They are equivalent in everything but syntax. If one contains a data race, so must the other. If one does not, then neither can the other.
This implies that TSAN does not detect "data races" as it is usually understood in the CS field -- it does not detect anything in the C++ program. What TSAN detects is only a particular, distinct situation that the C++ standard also happens to call a "data race". So if you're talking to a C++ language lawyer, you can say TSAN detects all data races within its window/buffer limits. But if you're talking to someone who doesn't sleep with the C++ standard under their pillow, they're not going to realize C++ is using a language-specific definition, and they're going to assume it flags programs like the equivalent of the Java program above, which has a data race but whose equivalent TSAN would absolutely not flag.
Yes, C++ and Java have different conditions for what a data race is.
That your position hinges on thinking all languages share the same memory model suggests a much deeper failure to understand some of the basic principles of writing correct parallel software and while numerous people have tried to correct you on this, you still seem adament on doubling down on your position so I don't think there is much point in continuing this.
> That your position hinges on thinking all languages share the same memory model suggests a much deeper failure to understand some of the basic principles of writing correct parallel software and while numerous people have tried to correct you on this, you still seem adament on doubling down on your position so I don't think there is much point in continuing this.
I never suggested "all languages share the same memory model". You're severely mischaracterizing what I've said and putting words in my mouth.
What I said was (a) data races are generals properties of programs that people can and do discuss in language-agnostic contexts, and (b) it makes no sense to say two well-defined, equivalent programs differ in whether they have data races. Reducing these statements down to "all programs share the same memory model" as if they're somehow equivalent makes for an incredibly unfaithful caricature of everything I've said. Yes, I can see there's no point in continuing.
> data races are generals properties of programs that people can and do discuss in language-agnostic contexts
"Data race" is a specific property defined by a memory model, which is normally part of a language spec; it's not usually understood as an abstract property defined in terms of outcome, at least not usefully. If you talk about data races as abstract and language-spec-agnostic properties, then yes, you're assuming a memory model that's shared across all programs and their languages.
> "Data race" is a specific property defined by a memory model, which is normally part of a language spec; it's not usually understood as an abstract property defined in terms of outcome, at least not usefully.
Really? To me [1] sure doesn't look useless:
> We use the standard definition of a data race: two memory accesses to the same address can be scheduled on different threads to happen concurrently, and at least one of the accesses is a write [16].
You're welcome to look at the [16] it cites, and observe that it is from 1991, entirely in pseudocode, with no mention of a "memory model". It so happens to use the phrase "access anomaly", but evidently that is used synonymously here, per [1].
> If you talk about data races as abstract and language-spec-agnostic properties, then yes, you're assuming a memory model that's shared across all programs and their languages.
No, nobody is assuming such a thing. Different memory models can still exhibit similar properties when analyzing file accesses. Just like how different network models can exhibit similar properties (like queue size bounds, latency, etc.) when discussing network communication. Just because two things are different that doesn't mean they can't exhibit common features you can talk about in a generic fashion.
Java defines what is or isn't a "data race" in one way, as part of its spec. C++ defines that same term "data race" in another way, as part of its spec. Your linked papers use a definition of "data race" which they define themselves based on a claimed 'standard definition' which is different than both the Java and C++ definitions of the same term. The point here is that the definition of "data race" is not universal or objective. When you want to evaluate whether or not some bit of code exhibits a "data race" then without qualifying what you mean when you say "data race" that property is gonna be evaluated in the context of the language, not some higher-level abstract assumption. You can talk about whatever set of common properties of a "data race" that are invariant to language or whatever and that you want to talk about, that's fine, but you need to make that expectation explicit if you want to have a productive conversation with anyone else, because "data race" by itself is context-dependent.
> This implies that TSAN does not detect "data races" as it is usually understood in the CS field -- it does not detect anything in the C++ program. What TSAN detects is only a particular, distinct situation that the C++ standard also happens to call a "data race"
TSAN defines its own specific definition of "data race" which is invariant to the languages that it evaluates. In particular the C++ definition of "data race" is more lenient than the TSAN definition of "data race"; C++ doesn't consider two atomic accesses of the same memory (at least one being a write) under memory_order_relaxed to be a data race, but TSAN does.
TSAN _could_ detect the C++ program as having a data race, for sure. And if it could evaluate Java programs, it _could_ also detect the Java program as having a data race, too.
> C++ doesn't consider two atomic accesses of the same memory (at least one being a write) under memory_order_relaxed to be a data race, but TSAN does.
I'm confused what you're saying here. If you take the program I linked, which uses relaxed ordering, and add -fsanitize=thread, TSAN doesn't flag anything. That seems inconsistent with what you're saying?
P.S. note that even using memory_order_seq_cst wouldn't change anything in that particular program.
First of all, TSAN can identify the presence of a data race, but it can't prove the absence of any data races. If -fsanitize=thread doesn't flag anything, that's insufficient evidence to say that there aren't any data races in the code, at least as TSAN defines data race, which is stricter than how C++ defines data race.
You've now received many comments from many different commenters that all kind of say the same thing, which is basically that your understanding of a "data race" is not really accurate. Those comments have included pretty detailed information as to exactly how and when and why your definition isn't totally correct. If I were you I'd take the L and maybe read up a bit.
> First of all, TSAN can identify the presence of a data race, but it can't prove the absence of any data races. If -fsanitize=thread doesn't flag anything, that's insufficient evidence to say that there aren't any data races in the code
I stated as much in my own first comment, with more details on when/why this does/doesn't occur.
> at least as TSAN defines data race, which is stricter than how C++ defines data race. [...] If I were you I'd take the L and maybe read up a bit.
Before assuming I haven't: I have. And the reading does not agree [1] [2] with your idea that TSAN sometimes considers relaxed atomic writes to be data races. Hence my replies. Remember, you wrote:
>> C++ doesn't consider two atomic accesses of the same memory (at least one being a write) under memory_order_relaxed to be a data race, but TSAN does.
I have not seen a single word anywhere -- not in the docs, not in example code, and (I think) not even from anyone here other than you -- suggesting that TSAN considers memory_order_relaxed writes to constitute a data race. It certainly does not flag them in the most trivial test I could think of, which I had already linked here [3].
If this is just my ignorance or misunderstanding, then instead of telling me to go do my own reading, please enlighten me and provide one link or example that demonstrates that TSAN considers atomic writes to the same memory to be a data race? I would be very glad to learn this, as I wasn't aware of this before, and am not seeing anything suggesting such.
> Those comments have included pretty detailed information as to exactly how and when and why your definition isn't totally correct.
I have linked to papers going back decades showing that "data race" has a general definition that don't entirely match up with what people have said. I myself have also explained in great detail how the general definition differs from the C++ one. I don't know what else to possibly provide here, but I'm done.
I haven't participated in this thread yet, but I would like to drill down on your TSan example. It seems to me that the window of timing for TSan to catch it is _super_ tight, as the overhead of creating a thread is very large relative to the other operations.
For something like TSan, which allows programs to execute normally with additional instrumentation, this timing matters, and so it's not a great example. An equivalent program being simulated in something like Loom would be much more convincing.
I'm a little confused, as you agree with your parent commenter that TSan not raising a flag is not conclusive. But you also appear to be using TSan not flagging the program as some kind of evidence in the same comment.
Thanks for following along and trying to clarify this, I really appreciate it.
The answer to your questions is that timing is not the issue in my example. You can notice this easily if you strip std::atomic<> from the type. TSAN can and does catch it just fine. The atomicity itself is what tells TSAN to not consider this a data race.
What probably threw you off was that I (sloppily) used "timing" as a way to say "proximity in the history buffer". [1] It's not wall clock time that matters, it's the number of memory accesses that fit in TSAN's history buffer. (This should also explain your confusion w.r.t. "tight" timing.)
Hence, the conclusivity depends entirely on the reason it wasn't flagged. (This is why I explained the failure modes quite precisely in my very first comment: not all of them have randomness involved.) If it wasn't flagged because the history buffer wasn't large enough, then obviously it's not conclusive. But if it wasn't flagged because TSAN noticed it and deliberately exempted it, then obviously it doesn't consider it a data race.
> The answer to your questions is that timing is not the issue in my example. You can notice this easily if you strip std::atomic<> from the type. TSAN can and does catch it just fine.
If you strip the std::atomic from your example, then you obviously lose read/write atomicity on the value, which should be trivial for something like TSAN to detect and flag.
> The atomicity itself is what tells TSAN to not consider this a data race ... the conclusivity depends entirely on the reason it wasn't flagged. (This is why I explained the failure modes quite precisely in my very first comment: not all of them have randomness involved.)
"The conclusivity" can only ever be "inconclusive" or "has a data race", it can never be "does not have a data race", because that's just not how TSAN (or any similar tools) work. See [1]. In your original C++ program there is no happens-before relationship between the thread you spawn and the main thread, so there is a data race by the TSAN definition, even though the read and the write are atomic, and even if a given execution of that code by TSAN doesn't yield an error. It's not about timing, at least not exactly -- it's about the guarantees of the scheduler and its execution of threads, which is non-deterministic without explicit synchronization in the application (or something along those lines)..!
You've linked to papers that define "data race" in a specific way. That doesn't mean that when anyone says "data race" in any other context they are using any of those papers' definitions, for reasons that have been exhaustively explained.
Actually I don't think your c++ program contains a data race, because the writes that populated the argument string happened before the atomic read. If you copied the string on the other thread before writing the pointer or you used a non atomic variable, I bet tsan would catch it.
> Actually I don't think your c++ program contains a data race, because the writes that populated the argument string happened before the atomic read
But the write to the static variable from the second thread is entirely unordered with respect to the read, despite the atomicity. If lack of ordering is your criterion for data races, doesn't that imply there is a data race between that write and that read?
No, because it's atomic. If that was a data race, and data races are UB, there would be no point in having relaxed atomics.
It's not my criterion, it's defined in the language standard. You can't have a data race on an atomic except for std::atomic_init. The reads on the string contents themselves are ordered because the args string is initialized before the thread is launched and the other one is static. If the launched thread allocated a new string and then populated the atomic, that would be a data race, unless stronger memory order was used by both the writer and the reader.
Never mind, I don't think we're disagreeing on what the C++ standard definition of data race is. I think I thought you were saying something different in your comment.
Also the programs are not direct translations of each other - a direct translation to Java would use varhandle and opaque (equivalent to relaxed), and then would not contain a data race.
You can't precisely convert it to c++, because there's no c++ construct that precisely matches Java behavior - namely atomic with relaxed memory order but permitting aliasing and load/store optimizations. The spiritually closest thing would just be a regular variable, which would be a data race that tsan would catch.
You can go the other way, porting that c++ to Java using varhandle and opaque memory order.
The Java specification which can be found here [1] makes clear that with respect to its memory model the following is true:
1. Per 17.4.5 your example can lead to a data race.
"When a program contains two conflicting accesses (§17.4.1) that are not ordered by a happens-before relationship, it is said to contain a data race."
2. Per 17.7 the variable s is accessed atomically.
"Writes to and reads of references are always atomic, regardless of whether they are implemented as 32-bit or 64-bit values."
However, atomic reads and writes are not sufficient to protect against data races. What atomic reads and writes will protect against is word tearing (outlined in 17.6 where two threads simultaneously write to overlapping parts of the same object with the result being bits from both writes mixed together in memory). However, a data race involving atomic objects can still result in future reads from that object to result in inconsistent values, and this can last indefinitely into the future. This does not mean that reading from a reference will produce a garbage value, but it does mean that two different threads reading from the same reference may end up reading two entirely different objects. So, you can have thread A in an infinite loop repeatedly reading the value "uninitialized" and thread B in another infinite loop repeatedly reading the value args[0]. This can happen because both threads have their own local copy of the reference which will never be updated to reflect a consistent shared state.
As per 17.4.3, a data-race free program will not have this kind of behavior where two threads are in a perpetually inconsistent state, as the spec says "If a program has no data races, then all executions of the program will appear to be sequentially consistent."
So while atomicity protects against certain types of data corruption, it does not protect against data races.
> Two threads writing a value to the same memory location (even atomically) is usually a data race in the CS/algorithm sense (due to the lack of synchronization), but not the C++ sense
You seem to conflate the concepts of "data race" and "race condition", which are not the same thing.
Two threads writing to the same memory location without synchronization (without using atomic operations, without going thru a synchronization point like a mutex, etc.) is a data race, and almost certainly also a race condition. If access to that memory location is synchronized, whether thru atomics or otherwise, then there's no data race, but there can still be a race condition.
This isn't a pedantic distinction, it's actually pretty important.
> Two threads writing a value to the same memory location (even atomically) is usually a data race in the CS/algorithm sense (due to the lack of synchronization), but not the C++ sense
Not only are you incorrect, it’s even worse than you might think. Unsynchronized access to data in c++ is not only a data race, it’s explicitly undefined behavior and the compiler can choose to do whatever in response of an observed data race (which you are promising it isn’t possible by using the language).
You are also misinformed about the efficacy of TSAN. Even in TSAN you have to run it in a loop - if TSAN never observes the specific execution order in a race it’ll remain silent. This isn’t a theoretical problem but a very real one you must deeply understand if you rely on these tools. I recall a bug in libc++ with condition_variable and reproducing it required running the repro case in a tight loop like a hundred times to get even one report. And when you fixed it, how long would you run to have confidence it was actually fixed?
And yes, race conditions are an even broader class of problems that no tool other than formal verification or DST can help with. Hypothesis testing can help mildly but really you want at least DST to probabilistically search the space to find the race conditions (and DST’s main weakness aside from the challenge of searching a combinatorial explosion of states is that it still relies on you to provide test coverage and expectations in the first place that the race condition might violate)
TSAN observes the lack of an explicit order and warns about that, so it is better in some sense than just running normally in a loop and hoping to see the occurrence of a specific mis-ordering. But that part of it is a data race detector, so it cannot do anything for race conditions, and as soon as something is annotated as atomic, it cannot do anything to detect misuse. It can be better for lock evaluation, as it can check they are always acquired in the same order without needing to actually observe a conflicting deadlock occurring. But I agree you need more formal tooling to actually show the problem is eliminated and not just improbable
Geez. I'm well aware it's UB. That was just sloppy wording. Should've said "not necessarily", okay. I only wrote "not in the C++ sense" because I had said "even atomically", not because I'm clueless.
Joe Armstrong goes to lengths to describe the benefits of "privately addressable" actors in his thesis (though he uses different terminology). As far as I'm aware, Erlang actors are also privately addressable. cf:
> System security is intimately connected with the idea of knowing the name of a process. If we do not know the name of a process we cannot interact with it in any way, thus the system is secure. Once the names of processes become widely know the system becomes less secure. We call the process of revealing names to other processes in a controlled manner the name distribution problem— the key to security lies in the name distribution problem. When we reveal a Pid to another process we will say that we have published the name of the process. If a name is never published there are no security problems.
> Thus knowing the name of a process is the key element of security. Since names are unforgeable the system is secure only if we can limit the knowledge of the names of the processes to trusted processes.
That may be what is in the thesis, but in real Erlang, any process can list all processes on any node it can reach: https://www.erlang.org/doc/apps/erts/erlang.html#processes/0 (As the docs say, it lists "processes on the local node" but I'm fairly sure any process can RPC that to any connected node to get the local processes on that node) From there you've got a lot of introspection on the processes in question.
And beyond that, there is no sandboxing in Erlang that allows you to do anything like spawn a process that can't access the disk or network or anything like that. So in practice that doesn't even hardly buy you anything on a real system because if you were somehow running unauthenticated Erlang code you've already got access corresponding to the OS permissions of the running Erlang process. (Though for those not familiar with Erlang, "somehow running unauthenticated Erlang code" is very unlikely, to the point that it's not a realistic threat. I'm just speaking hypothetically here.)
The thesis may cover how such systems could be turned to a more secure context but it does not correspond to current Erlang.
There are many layers of capabilities. Unguessable process IDs would be necessary for network capabilities. A sandboxing environment would be necessary for system or process level capabilities. It's still worth having the network security even if process security isn't there. Very few language implementations can provide that level of security.
My point here is merely to make sure that people do not come away from this thread thinking that Erlang has, well, anything like this at all. It isn't an especially insecure language as it lacks most of the really egregious footguns, but it isn't a specially secure one either, with any sort of "capabilities" or anything else.
This really only works if the process names are unguessable? Erlang PIDs are if not predicatable at least appear to be guessable, e.g. they look like this: <0.30.0>
100%. So tiring that the discourse around this is based on 15 minute demos and not actual understandings of the trade offs. Varun Gandhi's post that you link to is great.
Based on my experience with Rust, a lot of what people want to do with its "constant generics" probably would be easier to do with a feature like comptime. Letting you do math on constant generics while maintaining parametricity is hard to implement, and when all you really want is "a trait for a hash function with an output size of N," probably giving up parametricity for that purpose and generating the trait from N as an earlier codegen step is fine for you, but Rust's macros are too flexible and annoying for doing it that way. But as soon as you replace parametric polymorphism with a naive code generation feature, you're in for a world of hurt.
Implementations are not exported or public at all: they are used in functions and those functions are exported. For correctness, you want those implementations to be resolved consistently (this is what coherence is). This post gives the example of unioning two sets: you need to know that they're ordered the same way for your algorithm to work.
So the problem isn't that the implementation is public, it's that its used somewhere by a function which is public (or called, transitively, by a public function). For a library, code which is not being used by a public function is dead code, so any impl that is actually used is inherently public.
You might say, okay, well can binaries define orphan impls? The problem here is that we like backward compatibility: when a new impl is added to your dependency, possibly in a point release, it could conflict with your orphan and break you. You could allow users, probably with some ceremony, to opt into orphan impls in binaries, with the caveat that they are accepting that updating any of their dependencies could cause a compilation failure. But that's it: if you allow this in libraries, downstream users could start seeing unsolvable, unpredictable compilation failures as point releases of their dependencies introduce conflicts with orphan impls in other dependencies.
It would still be consistent; everything with my crate resolves `impl Foo for Bar` to what I define, everything with other crate resolves `impl Foo for Bar` to what they defined, and any other crate would have a compilation error because those crates didn't `impl Foo for Bar`.
If I for some reason exported a method like `fn call_bar(foo: Foo) -> Bar` then I think it would use my `impl Foo for Bar` since the source code for the trait impl was within my crate. What happens if instead I export like `fn call_bar<F: Bar>(foo: F) -> Bar)` is probably a bit more up to debate as to whose trait impl should be used; probably whichever crate where F being Foo is originally known.
I think they did say binaries can define ophan impls; and the only way somebody should be able to break your code is by changing the trait definition or deleting the implementing type. Otherwise your implementation would override the changed implementation. This seems fine because even if I locally define `Foo` which lets me to `Foo impl Bar`; if you then delete Bar then my code breaks anyways.
Of course it can change, that's what removal of coherence does.
It seems to me to be a logical impossibility to allow orphan implementations, and allow crate updates, and not have trait implementations changing at the same time. It's a pick-two situation.
Your conclusion is correct. I'm very happy with the two that Rust picked and tired of people pretending that there will be a magical pick three option if we just keep talking about it.
I also think Rust has picked the right default, but I wouldn't mind having an opt in to the other pair of trade-offs. There are traits like `ToSql` that would be mostly harmless. Serde has tricks for customizing `Serialize` on foreign types, and this could be smoother with language support. Not every trait is equivalent to Hash.
Consider Java for example. In Java, interfaces are even more restrictive than traits: only the package which defines the class can implement them for that class, not even the package which defines the interface. But this is fine, because if you want to implement an interface for a foreign class, you create a new class which inherits from it, and it can be used like an instance of the foreign class except it also implements this interface.
In Rust, to the extent this is possible with the new type pattern it’s a lot of cruft. Making this more ergonomic would ease the burden of the orphan rule without giving up on the benefits the orphan rule provides.
This post is written by a fan of implicits, so it frames it as "better" than traits, though at the end it admits it is in fact a complex trade off, which is the truth. In my opinion, the trade off favors traits, but others may feel differently.
The core difference between traits (also called type classes) and ML modules is that with traits the instance/implementation has no name, whereas for ML modules they do. The analogy here is between Rust/Haskell's traits/typeclasses and ML's signatures and between Rust/Haskell's impls/instances and ML's structures. In Rust/Haskell, implementations are looked up by a tuple of types and a trait to determine the implementation. The advantage of this is that you don't need to name the impl and then invoke that name every time you use it; since we usually don't think of "Hash for i32" as something which has a meaningful name beyond the relationship between Hash and i32, this is quite nice.
But coherence requires that instances resolve consistently: if I hash an integer in one code location to insert into a map and then hash it again in a different location to do a lookup on the same map, I need to hash integers the same way each time. If you care about coherence, and the correctness property it implies, you can't allow overlapping impls if impls aren't named, because otherwise you aren't guaranteed a consistent result every time you look up the impl.
This introduces another problem: you can't see all the impls in the universe at once. Two libraries could add impls for types/traits in their upstream dependencies, and the incoherence won't be discovered until they are compiled together later on. This problem, called "orphan impls," causes its own controversy: do you just let downstream users discover the error eventually, when they try to combine the two libraries, or do you prohibit all orphan impls early on? Rust and Haskell have chosen different horns of this dilemma, and the grass is always greener.
Of course with implicits, this author intends a different solution to the problem of resolving instances without naming them: just allow incoherence (which they re-brand as "local coherence"). Instead, incoherence impls are allowed and are selected in a manner based on proximity to code location.
As the post eventually admits, this does nothing to solve the correctness problem that coherence is meant to solve, because code with different nearest impls can be compiled together, and in Rust such a correctness problem could become a memory safety problem, and how you figure out if the impl you've found for this type is actually the nearest impl to your code is left as an exercise to your reader. But sure, since you've rebranded incoherence to "local coherence" you can do some juxtaposed wordplay to call coherence a "local maxima" because achieving it has the downside that you can't have arbitrary orphan impls.
I read through this, and thought to myself: "wow, what a response that elucidates the PL design tradeoff space while giving real world examples of languages that occupy various points on that space; all as concisely and economically as possible."
And then I read the user name. Of course it's boats!!
Thank you for all your work! I want to say that especially since I've noticed a lot of shallow dismissal of your work recently (shallow because the dismissal often doesn't engage with the tradeoffs of whatever alternative solution it proposes in the context of Rust, among other things), and would like you to know there's a lot of us who are very very grateful for all the productivity and empowerment you've enabled through your contribution to Rust.
Let's assume for the sake of argument that the standard library didn't implement Hash for i32.
You could then have two crates, A and B, with different implementations of Hash for i32, and both could instantiate HashMap<i32>.
This can be made to work if we recognize the HashMap<i32> in crate A as a different type than the HashMap<i32> in crate B.
This only really works if orphan implementations are exported and imported explicitly to resolve the conflict that arises from a crate C that depends on A and B.
If C wants to handle HashMap<i32>, it needs to decide whether to import the orphan implementation of Hash for i32 from crate A or B (or to define its own). Depending on the decision, values of type HashMap<i32> can move between these crates or not.
Basically, the "proximity to code location" is made explicit in a way the programmer can control.
This makes type checking more complex, so it's not clear whether the price is worth it, but it does allow orphan implementations without creating coherence problems.
Implementations are not imported at all because they are not named. Like I wrote, named implementations (ala ML modules) is a valid alternative, but one with a much greater annotation burden.
You could imagine having named impls that are allowed to be incoherent as an additional feature on top of coherent unnamed impls, but to use them you would need to make any code that depends on their behavior parameterized by the impl as well as the types. In fact, you can pretty trivially emulate that behavior in Rust today by adding a dummy type parameter to your type and traits.
Right, but what I'm describing is a tradeoff point that's between the extremes, where implementations are unnamed but can still be explicitly imported.
Making my example more explicit, you'd need syntax along the lines of
// inside crate C
use A::impl std::hash::Hash for i32;
This syntax would explicitly be limited to orphan implementations.
I suppose to further clarify, there's still some coherence requirement there in that crate C can't import the conflicting implementations from both A and B. Which could then perhaps be worked around by adding syntax to spell types along the lines of
HashMap<i32 + A::impl Hash, V>
Which you could argue is a form of naming implementations, I suppose? I'm not familiar with ML. You could maybe also think of it as a more ergonomic way of doing (more or less) those wrapper types.
In any case, the annotation burden only exists where it's actually needed to enable orphan implementations.
And either way, multiple different impls can safely coexist within the overall set of code that's linked together, with everything being statically checked at compile time.
I think rather than at odds with without.boats is saying, this is very much aligned with what they are suggesting. While not literally `use A::impl std::hash::Hash for i32` is for all intents and purposes naming the impl.
Similarly, `HashMap<i32 + A::impl Hash, V>` is what they are talking about when they refer to parameterizing code on the impl chosen.
Essentially, yes. What I don't see is their claim that it's a "much greater annotation burden". Compared to what? Rust today just doesn't allow this at all, and if you use a wrapper type to simulate it, you definitely end up with more "annotations" (boilerplate).
FWIW It's not at all clear to me how this requirement would be implemented in practice: "This syntax would explicitly be limited to orphan implementations."
Maybe I'm missing something, but the compiler can tell whether an implementation is an orphan. That's how you get an error message today if you try to write one. So I don't know what difficulty you have in mind.
I'm pretty sure the article resolves the implicit dependencies at the point of the declaration. (Did I misunderstood it?)
So, you don't have a `data HashMap datatype`, you have a `data HashMap hashAlgo datatype`, where hashAlgo is decided implicitly by the context. That's the entire reason it's called "implicit".
Every other usage of the data knows how to hash your values because of that `hashAlgo` parameter. It doesn't matter where it happens.
Great write up, and you're absolutely right that implicits are moving towards ML modules. Quite possibly a production system would end up being synonymous with ML modules out of the need for named impls.
Small nit in terminology, the implicits described are coherent. Part of their value over previous implicit work is that they are coherent and stable. I have generally seen the property you're referring to called canonicity, which they do lack.
reply