> 2. Slices are confusing types that create subtle and hard-to-diagnose data races
The "Slices" example is just nasty! Like, this is just damning for Go's promise of "_relatively_ easy and carefree concurrency".
Think about it for a second or two,
>> The reference to the slice was resized in the middle of an append operation from another async routine.
What exactly happens in these cases? How can I trust myself, as a fallible human being, to reason about such cases when I'm trying to efficiently roll up a list of results. :-/
Compared to every other remotely mainstream language, perhaps even C++, these are extremely subtle and sharp.. nigh, razor sharp edges. Yuck.
One big takeaway is this harsh realization: Golang guarantees are scant more than what is offered by pure, relatively naïve and unadulterated BASH shell programming. I still will use it, but with newfound fear.
As a multi-hundred-kloc-authoring-gopher: I love Go, and this article is killing me inside. Go appears extremely sloppy at the edges of the envelope and language boundaries, moreso than even I had ever realized prior to now.
Full-disclosure: I am disgusted by the company that is Uber, but I'm grateful to the talented folks who've cast a light on this cesspool region of Golang. Thank you!
p.s. inane aside: I never would've guessed that in 2022, Java would start looking more and more appealing in new ways. Until now I've been more or less "all-in" on Go for years.
> I never would've guessed that in 2022, Java would start looking more and more appealing in new ways.
I don't quite understand the hatred (to the point of shouting "using Java? Over my dead body), especially in startups, towards Java. I mean, it's a language, big deal. Java's ecosystem more than enough offsets whatever inefficiencies in the language itself, at least for building many of the internal CRUD services. Besides, people like Martin Thompson shows us how to build low-latency applications with ease too. Libraries like JCTools beat the shit out of many new languages when it comes to concurrency for productivity, performance, and reliability. How many engineers in startups claim that they hate Elasticsearch because "Java sucks"? Yet how many can really build a platform as versatile as ES or a Lucene replacement with economical advantages? How many people in startups openly despise Spark or Flink and set out to build a replace because "Java is slow and ugly". Yeah, I've seen a few. And a payment company insists that Rust is the best language because "GC is inefficient and ugly", even though they are still in the phase of product iteration and all their services simply wrap around payment gateways? What's the point?
Disclaimer: I use Go in work. It's not like I have skin in the game for speaking about Java.
I actually think the Java ecosystem is part of why people dislike it. Java seems to attract a lot of extremely heavyweight frameworks (like Spring) that are too complex to fully understand and too heavyweight to make sense for most projects.
So use helidon, micronaut, javalin, or spark if you want something small, but I suspect in any real application you’ll just end up recreating half of spring. That’s what my company did and it’s not near the quality of anything in Spring.
> how to build low-latency applications with ease too
That's a bit of a stretch. Surely, you can build low-latency apps, but I'd be very careful with the "with ease" bit. Low-latency Java often means zero heap allocations, aggressive object avoidance / reuse, heavy use of primitive types everywhere, so it is very much low-level like C, only with no tools that even plain old C offers, e.g. no true stack-allocated structs and no pointers. And forget about all the high-level zero-cost abstractions that C++ and Rust offer.
Fair point. The "with ease" part has also to do with Java's ecosystem. For instance, Martin Thompson used to teach people how to write a single-producer-multi-consumer queue. In a matter of hours, people can achieve 100M+ reads and writes on a 2014 MacBook Pro (I understand that throughput is different from latency, but given the fixed number of CPUs in this case, the latency of such implementation is also phenomenal). Better yet, Java folks have libraries like JCTools, so they don't event have to spend that few hours to get even higher performance.
My litmus test is how fast one can implement functionalities of the data structures/algorithms in the book The Art of Multiprocessor Programming in production quality. It looks chic languages like Rust are not there yet.
Don't forget "chic" languages like Rust have the whole C ecosystem at their disposal.
Having done Java for many years and recently also done Rust, I'm not very convinced one ecosystem is richer than the other, when we talk about high performance computing. I've already hit a few things that are present in Rust I wished to have in Java. Generally I find the multithreading/concurrency libraries available in Rust very good.
For me, Java and MySQL kind of died* when they became an Oracle thing. I just don’t want to go near anything that Oracle touches.
The other thing is that I tend to write little programs where simple deployment on a low-resource machine is desirable.
Go can handle that. Java kind of does the job with Graal now.
The JVM is incredible, though, and I love Clojure. I’m hoping that Loom + Graal helps to kickstart more competition in the “concurrent, parallel, simple to deploy” space.
* Died to me; obviously they’re both alive and well in the broad world.
> For me, Java and MySQL kind of died* when they became an Oracle thing. I just don’t want to go near anything that Oracle touches.
Come on, that’s a cheap reason (for Java, for open source db I would also go with postgre but for different reasons). Java is one of the very few languages with a full specification (not “whatever our compiler does, that’s the spec”), it has plenty of fully independent full implementations that even pass one of the most detailed test suites for complete spec-compliance, and the platform is so so much ingrained in the biggest corporations that any one of the following companies could easily, single-handedly finance the future of Java if anything were to happen: Apple, Google, Microsoft, Amazon, Alibaba.
And for all the bad things one can without doubt throw at Oracle, they are surprisingly good at shepherding the language and platform. It has been growing in a very good direction with fast update cycle, it has state-of-the art research and development going on, and with Loom on the near horizon and Valhalla on the slightly further horizon I would say Java has one of the brightest futures ahead. Like, Valhalla would bring automagically a huge performance improvement for free, and Java is very competitive in performance as is.
Agree that Java is pretty good with records / sealed types / loom, but one nice thing about the Oracle Java team is they do not add half baked features (primarily since they have the last mover advantage) - for (e.g.) Valhalla will have value types, but they'll be immutable so they can be freely copied and used. Loom will have structured concurrency on debut, which IMHO makes vthreads manageable.
But I've my own apprehensions about loom which actually breaks synchronized blocks (by pinning the carrier thread), and are used extensively in legacy libraries and even in the more recent ones (like opentelemetry java sdk).
Written a lot of Java, Python, and Go in my career... every single time I see someone take a hardline stance against Java it's always because they had one particular bad experience with it 15-20 years ago and couldn't bend it to their will like Python or Lisp. Or they fought Maven, or some other ancillary tool. Or they rail on the generics and yet the use-cases they come up with for true reified generics are generally niche.
Java's got problems. The biggest one is the framework laden ecosystem and that some of the frameworks are all or nothing. But the language and runtime are rock solid. I don't get the hate.
Mostly Java, Python, and a bit of Go here, also. If you're not sure what language to develop a back end service in, you'll rarely go wrong by picking Java. The JVM absolutely is rock solid. The number of libraries and frameworks available is amazing. If you stay way from heavy weight frameworks and use something leaner like Spring Boot or Dropwizard, you'll be fine.
Slices may just be one of the best and worst parts of Go. They're cumbersome, their behavior sometimes feels 'inexplicable,' and even as an experienced developer you are likely to eventually fallen into one of the traps where your 'obvious' code isn't so obvious.
That said... when programming in programming languages without a slice type, I always want to have one. And though it's confusing at times, the design does actually make sense; without a doubt, it's hard to think of how you would improve on the actual underlying design.
I really wish that Go's container types were persistent immutable or some-such. It wouldn't solve everything, but it feels to me like if they could've managed to do that, it would've been a lot easier to reason about.
> And though it's confusing at times, the design does actually make sense; without a doubt, it's hard to think of how you would improve on the actual underlying design.
Go slices are absolutely the worst type in Go, because out of laziness they serve as both slices and vectors rather than have a separate, independent, and opaque vector types.
This schizophrenia is the source of most if not all their traps and issues.
> I really wish that Go's container types were persistent immutable or some-such.
That would go against everything Go holds dear, since it's allergic to immutability and provides no support whatsoever for it (aside from simple information hiding).
Think you are overstating its importance, but I do agree the language’s biggest pitfalls are easily right here. That said, if you start from first principles and force every feature and construct to be justified ruthlessly, it’s easier to see how they got there. Constness (as a type concept) and immutability are one of those things that can explode into surprising complexity for the language and compiler.
In retrospect, it may have been worth the pain. Maybe in the distant future, Go will have it. For now, if you want a more sophisticated language, options exist, with all the tradeoffs that will entail.
> it's hard to think of how you would improve on the actual underlying design.
I'm biased as a Rust fan in general, but I think Rust pretty much nails this. Rust distinguishes between a borrowing view of variable length (a slice, spelled &[T]) and an owned allocation of variable length (usually a Vec<T>). Go uses the same type for both, which makes the language smaller, but it leads to confusion about who's pointing to what when a slice is resized.
I do pretty much agree there, but also, I'm not sure that solves a whole lot of problems in the frame of Go. Since Go lacks ownership or constness as a concept, it would be weird if there weren't convenience functions for e.g. appending to a slice, because it's always possible for that to be done; if the language didn't do it, the end users could certainly write the function themselves. I think they would've needed to expand the language in order to make meaningful improvements to slices.
> it's hard to think of how you would improve on the actual underlying design.
I think ranges are part of D's design they got right, and I think a similar abstraction would be in line with golang's general design ethos, GC design, etc, other than perhaps some folks might pattern match it as "this is like STL therefor bad burn it with fire etc" without actually thinking about it in detail.
The append pattern also implies the opposite of reality, in that it also (usually!) mutates mySlice. Which is the source of one of the two(?) possible races in that piece of code.
I'm no fan of go, although I think it's better than many other languages for services, but the argument here is against the label "append", not the operation. It's a poor name for the operation, but the documentation is quite clear about what's going on. I'd argue that understanding the keywords and builtins of a language is the bare minimum an engineer should do before he starts writing anything in it.
shouldn’t be allowed, because what it’s likely to do is not what anyone meant. In a pass-by-value language, passing a slice or map by value should copy it, append should be a method that returns void, and passing a pointer should be the way to share state and avoid copies.
Let's try appending just one more item before redoing ^ that example, where they all shared the same data: https://go.dev/play/p/5JneXHMeUjx
[a b c]
[z b c x x2]
[a b c y y2]
Notice that in all of these examples, I haven't explicitly declared a length or capacity. There's nothing "funny looking" or clearly intentionally allowing these different behaviors, it's just simple, very-common slice use.
.... so yeah. This is a source of a number of hard-to-track-down bugs.
Yes, this majorly tripped me up when working on my first big Go project. Spent days hunting for a non-deterministic data corruption issue which was caused by this. It's definitely my fault for not fully reading the documentation and not realizing that append may (and often does) mutate the slice, but I was indeed misled by the `x = append(x, ...)` syntax into assuming it only works off of a copy without modifying the original.
Go's append is pretty much C's realloc, and behaves very much the same; the pointer you get back may or may not be the passed-in pointer.
Also,
> If the capacity of s is not large enough to fit the additional values, append allocates a new, sufficiently large underlying array that fits both the existing slice elements and the additional values. Otherwise, append re-uses the underlying array.
> What exactly happens in these cases? How can I trust myself, as a fallible human being, to reason about such cases when I'm trying to efficiently roll up a list of results. :-/
For me: minimize shared mutable data. If I really can’t get rid of some shared mutable data, I mutex it or use atomics or similar. This works very well—I almost never run into data races this way, but it is a discipline rather than a technical control, so you might have to deal with coworkers who lack this particular discipline.
Absolutely, the disappointing part is that as code authors, we need to constantly remember about various (otherwise appealing and even encouraged by the language syntax and control constructs) footguns and "never approach such areas" of (totally valid) syntax.
Reminds me of programming in Javascript (it's extreme example, but the similarity is there).
Yeah, it’s a bit disappointing. It doesn’t bother me too much, but it could be improved by a linter which could help you find shared mutable state. Without a concept of “const” (for complex types, anyway), I’m not sure how feasible such a linter would be.
Without disagreeing that it's an enormous footgun, one good way to avoid such slice issues is to use the uncommon `a[x:y:z]` form to ensure the slice can't grow. As we're starting to write a lot of generic slice functions with 1.18, we're using this form in almost all of them which may add elements.
> one good way to avoid such slice issues is to use the uncommon `a[x:y:z]` form to ensure the slice can't grow.
Do you mean you always use `a[x:y:y]` in order to ensure there is no extra capacity and any append will have to copy the slice?
Is append guaranteed to create a new slice (and copy over the data) if the parameter is at capacity? Because if it could realloc internally then I don't think this trick is safe.
Of course but the new slice could be (ptr, len+1, cap+x) because realloc() was able to expand the buffer in-place. Which yields essentially the same behaviour as an append call with leftover capacity.
But I guess realloc is a libc function, and Go probably goes for mmap directly and would implement its own allocator, and so might not do that. Unless / until they decide to add support for it.
If you start worrying about "what if <C concern> underlying the Go runtime happens?" you'll find a lot worse than realloc. Luckily, in the absence of runtime bugs, you don't have to think about it.
It's not a concern about C, it's a concern about the underlying possibility expressed in C terms: sizeclass arenas are useful to an allocator, that means slack, which means the opportunity for in-place allocation resizing.
You advocated the use of a very specific behaviour of `append` as a DID and possibly a correctness requirement of programs.
My worry is about whether this behaviour is a hard specification of the Go language, or just an implementation detail of the primary Go implementation. And how programs applying your recommendation would handle such behaviour changing.
Sorry, this is nonsensically mixed up. Even if it reallocs in place, which it may be free to do, the language semantics still guarantee only one observer gets that extended space. Otherwise you would need to worry about this even when dealing with immutable structures like concatenating strings.
I've got to say I'm not entirely clear on what they talk about specifically.
Is it simply that the `results` inside the goroutine will be desync'd from `myResults` (and so the call to myAppend will interact oddly with additional manipulations of results), or is it that the copy can be made mid-update, and `result` itself could be incoherent?
So any append operations on slice A will mess up the data in that backing array.
Now, sometimes your append will resize the slice, in which case the data is copied and a slice with a new larger backing array is returned. If this was happening concurrently then you'd lose the data in racing appends.
If the append doesn't need to resize the slice, then you'll overwrite the data in the backing array. And so you'll corrupt the data in the slice.
Although the code in the post doesn't actually look like it has an issue. Their tooling just flagged it up as it potentially has an issue if the copy was actually used in the function. But the `safeAppend` function targets the correct slice each time.
I’m a bit doubtful as what you talk about is definitely a slice issue but it’s already an issue in completely sequential code if you reuse appended-to slices.
So while it’s also an issue in concurrent code, it’s really no more so.
It is an issue in sequential code because as you say, that's just how slices work. But if you're always using the same variable you'll never encounter it because that slice can't change between you reading that variable and writing to it.
Once concurrency is introduced you can now read from the same variable, but another goroutine may have written to the same slice in the meantime. That's why you must protect the read and writes and synchronise them.
It's fundamentally just a race condition issue with unprotected reads. But people often overlook it in the case of slices because they think they're just taking a reference to the slice, which is safe to do concurrently IF slices were reference types. But they're not, they are copied.
> Once concurrency is introduced you can now read from the same variable, but another goroutine may have written to the same slice in the meantime. That's why you must protect the read and writes and synchronise them.
But, again, this can easily occur in sequential code as well: you call a function passing it the slice, it mutates the slice internally, it doesn't document that, or maybe the documentation is even wrong, you now hit this issue.
I believe they made a mistake with that example. It doesn't look unsafe to me because the myResults sliced passed to the goroutine is not used. Or perhaps the racy part was left out of their snippet.
Below is what might be what they have meant. This code snippet is racy because an unsafe read of myResults is done to pass it to the goroutine and then that version of myResults is passed to safeAppend:
func ProcessAll(uuids []string) {
var myResults []string
var mutex sync.Mutex
safeAppend := func(results []string, res string) {
mutex.Lock()
myResults = append(myResults, res)
mutex.Unlock()
}
for _, uuid := range uuids {
go func(id string, results []string) {
res := Foo(id)
safeAppend(myResults, id)
}(uuid, myResults) # <<< unsafe read of myResults
}
}
They talk about the "meta fields" of a slice. Is the problem that these "meta fields" (e.g. slice length and capacity) are passed by value, and that by copying them, they can get out of sync between coroutines?
The "Slices" example is just nasty! Like, this is just damning for Go's promise of "_relatively_ easy and carefree concurrency".
Think about it for a second or two,
>> The reference to the slice was resized in the middle of an append operation from another async routine.
What exactly happens in these cases? How can I trust myself, as a fallible human being, to reason about such cases when I'm trying to efficiently roll up a list of results. :-/
Compared to every other remotely mainstream language, perhaps even C++, these are extremely subtle and sharp.. nigh, razor sharp edges. Yuck.
One big takeaway is this harsh realization: Golang guarantees are scant more than what is offered by pure, relatively naïve and unadulterated BASH shell programming. I still will use it, but with newfound fear.
As a multi-hundred-kloc-authoring-gopher: I love Go, and this article is killing me inside. Go appears extremely sloppy at the edges of the envelope and language boundaries, moreso than even I had ever realized prior to now.
Full-disclosure: I am disgusted by the company that is Uber, but I'm grateful to the talented folks who've cast a light on this cesspool region of Golang. Thank you!
p.s. inane aside: I never would've guessed that in 2022, Java would start looking more and more appealing in new ways. Until now I've been more or less "all-in" on Go for years.