Of course not. You can't put absolute worst case times on code in practice. Worst case might be that someone managed to load 80PB of RAM on a CPU underclocked to 50 MHz. Or that the OS suspended the process in the middle of GC and was promptly frozen for a week while VMs were migrated across the country in the cup holder of someone's Toyota.
Worst case in absolute time always requires ignoring pathological cases.
This feels very wrong to me. Qualifying it with some sort of analysis is fine. Just saying "typical" is borderline negligent and is why nobody just uses naive quicksort in production systems where runtime can actually matter.
We don't care about worst-case latency in practice, and average case is often ignored as well. We look at 99.9% latency numbers or things like that. Worst-case is for people designing pacemakers or rockets, that's not what we are doing here.
Pretty sure web enabled cameras are NOT pacemakers NOR rockets... heck they can't even put a different password on each device by printing on a sticker.
Nor is my sous vide machine or anything else I've seen as an IoT device. I wouldn't call say my oscilliscope an IoT device even if it can connect to wifi. It's a scientific device with cloud upload.
I'm well aware of this. I'm also aware that people are hacking these devices left and right. But a cow is a mammal and a mammal isn't a cow.
Gp post was not talking about security we're talking about Real time computing. If it's running on IoT hardware, eg rpi running linux, it's likely not doing real time os stuff because linux isn't really a real time os.
And I still don't classify pacemakers as IoT. Just because you want to slap a vague acyronym on any and every "connected" embedded device doesn't mean I agree.
Completely fair point and one I should have acknowledged. I was not intending to call the OP negligent. From what I saw, they did a full analysis and are preparing the full numbers. I was just irked by this idea that "typical" is all one needs.
It depends on how it's been determined that it's the "typical worst case"
...maybe an anonymous usage program?
...maybe their typical use case? (which is definitely not everyone elses)
...maybe the typical use case of Google employees using Go for their single-computer tasks? (which would be closer to a "typical user" as in people who read HN)
...an average of those three, or perhaps even more, use cases? wighted in some way?
But if done correctly, optimizing the 90% typical problem by 20%, gives you a general 18% improvement, far more than even making disappear the least important 10% of the runtime (which some bad optimization efforts tend to focus on...).
Making software faster is always about knowing where to work, and then focusing exactly on that. Nothing wrong in attacking the "typical [anything] case".
Sure you can. It's standard in embedded software. Called WCET analysis. There's also real-time GC's. Go team has simply not done either of these in their design. So, they can't give a worst-time estimate.
Indeed. If there's no worst case time, that eliminates use in hard-real-time, certifiable embedded, e.g. aerospace, medical, etc.
You can't very well tolerate an "atypical" GC pause when you're firing a laser or radiation pulse, doing engine control, or even updating a display in a flight instrument setting.
A JIT can do better escape analysis, eliminating allocations. A JIT can also be more clever about safe point insertion. And when a JIT detects that code isn't being called concurrently, it can emit much cheaper read/write barriers.
In my, admittedly maybe lacking, understanding amortization doesn't really work that way. "Amortized worst case" (for example) means that you can still bound the worst case[1], but it's just not necessarily going to be a very accurate bound. Obviously, amortized complexity doesn't tell you "<X ms" right off the bat since it deals in abstract "operations", but if you have known worst case bounds for all the "operations", then an amortized bound for a given operation will give you something equivalent to "<X ms".
[1] I mean, it's a common proof technique to actually have a non-negative cost function (+ invariant) and postulate/derive an upper bound on that, so... what gives? What's your reasoning here?
No. The entire premise of amortized analysis is to get a more optimistic "eventually O" number. "Eventually" is not good enough for real time. Yes you can get a real hard worst-case number, but that's a different analysis from amortized analysis. Unless all of your amortized operations are happening between deadlines, it's useless--worse, dangerous--for safety. And amortized analysis is almost never used that way. You don't have language run-times that reinitialize between every deadline.
I somehow confused myself by thinking of it in terms of one of the proof techniques for amortized worst case where you derive a fixed upper bound for any "n". Of course this is a much stronger property than needed.
Other systems I'm aware of that are capable of similar feats (I know .NET's collector can, not sure about Hotspot's) use page poisoning. Basically the compiler/JIT puts a bunch of extraneous memory accesses to a known address at key points where the GC can track what all variables and registers hold (called "safepoints"), in a fairly liberal pattern around the code. When the GC wants to stop all the threads in their tracks, it unmaps the page holding that address, causing segfaults in every thread it it hits one of the safepoints. The GC hooks the segfault handler, and simply waits for all threads to stop.
I'm not sure that's what Go does, but I'm not aware of a faster way on standard hardware.
Java 7 and above provide XX:MaxGCPauseMillis flag for GC conf
From Mechanical Sympathy Blog [1]
" G1 is target driven on latency –XX:MaxGCPauseMillis=<n>, default value = 200ms. The target will influence the amount of work done on each cycle on a best-efforts only basis. Setting targets in tens of milliseconds is mostly futile, and as of this writing targeting tens of milliseconds has not been a focus of G1. "
So in Java world we are talking 100s of milli seconds of worst case which is 3 order of magnitude higher than Go.
Well, yeah. You're comparing the G1 (for server workloads) to Go's concurrent collector. There's a concurrent collector you can use in incremental mode for the JVM if you want to trade throughput for pause time, like Go does.
The HotSpot designers have (correctly, IMO) observed that server use cases typically prefer throughput to latency.
On a meta note, I wish people would focus on the actual algorithms instead of taking cited performance numbers out of context and comparing them head to head. The Go GC uses the same concurrent marking and sweeping techniques that CMS does. This change to avoid stop the world stack sweeps is something that no HotSpot collector does. But it's counterbalanced by the fact that Go's GC is not generational. Generational GCs change the balance significantly by dramatically improving throughput.
In my experience you are still getting single-to-double-digit millisecond pauses on average using CMS (even with some attention/tuning). Do you really think Hotspot can offer a GC where pauses longer than 100us are considered interesting enough to look into?
The question is what use cases can tolerate large throughput hits but not few msec pause times (G1 can also do pauses of a few msec in many cases, I see hundred msec pause times on very large heaps, but not desktop sized heaps).
I suspect there are very few use cases. The G1 team seems to be focusing on scaling to ever larger heaps right now, like hundreds of gigabytes in size. They're relatively uninterested in driving down pause times as Shenandoah is going to provide that for the open source world and Azul already does for the $$$ world.
Unless you have numbers to share for CMS latency in Java I have no reason to assume that they are materially different from G1. I am using CMS for my server applications and multi second latency is quite common STW pauses in CMS.
In general Oracle JDK uses order of magnitude more memory and order magnitudes higher GC latency compare to Go. IMO it is quite useful to remember when deciding on language on new projects. If hotspot people think that these numbers are not true I am sure they would let people know.
Here are Shenandoah pause numbers 11]. Max pause is 45ms which I would agree is ultra low pause for Java. Because 100s of ms pause is common for by Java server applications. Here are GC numbers for Go >= 1.6 for 100GB+ [2]. Go 1.7/1.8 have/going to have lower numbers than that.
A big difference is that Shenandoah still does compacting while gogc does not. That also means it can do bump-pointer allocation, i.e. it has a faster allocation path.
There must be some end user (positive) impact of memory compaction in Java. But I do not see in benchmarks where Java programs takes 2-10 times more memory and runs slower than Go.
The end user benefit is stability. A runtime that compacts the heap cannot ever die due to OOM situation caused by heap fragmentation, whereas if you don't compact then the performance of an app can degrade over time and eventually the program may terminate unexpectedly.
Those benchmarks are of limited use since the JVM has startup costs which get armortized over time. Full performance is only achieved after JIT warmup. AOT would improve that, but on openjdk that's experimental and few people use other JVMs that support AOT out of the box.
About the memory footprint: The runtime is larger, so there's a baseline cost you have to pay for the JITs. But baseline != linear cost.
If you have a long-running applications that run more than a few seconds and crunch more than a few MB of data those numbers would change.
So unless you're using the JVM for one-shot script execution those benchmarks are not representative. And if you do then there would be certain optimizations that one could apply.
> There must be some end user (positive) impact of memory compaction in Java.
Fewer CPU cycles overhead per unit of work, i.e. better throughput, which does seem to be an issue for gogc[0]. No risk of quasi-leaks in the allocator due to fragmentation. Reduced cache misses[1]
That is true - although typed arrays will solve that in the future - but the benchmarks you cite (e.g. fannkuch-redux) still run into a memory floor created by the larger runtime.
Also, `regex-dna` will probably benefit from compact strings in java 9.
And some of the longer-running benchmarks are in fact faster on java, so I think that supports my point about warmup.
Interesting enough .NET doesn't use the "memory access" trick, instead is uses either a direct poll against a variable, or for user code can sometimes just rudely abort the running thread (if it knows it's safe to do).
This. It's a lot easier to horizontally scale things with a lean towards consistently lower operational latency. You can keep raking in the benefits and cranking up throughput without a whole lot of thought.
It's much more expensive and complex to take an erratic latency operation and bring it down by throwing on more resources. As far as I can tell, the normal design course is making sure all your major actions are either pure or idempotent allowing parallel (and redundant!) requests to be made... which is a large (worthy, but large) engineering effort, and then we're talking about scaling to 2x or more just so you can make that redundant-request thing your default behavior.
Another approach you can use in some cases with the JVM, which is often the simplest, is to set up the JVM so it doesn't GC (give it a lot of memory), then either just spawn a new JVM to take over, or take the machine about to run a GC out of your load-balanced pool before running a full GC, then put it back in again.
Doing the manually triggered & staggered GC trick on a pool of machines you control can give you very low latency guarantees, since no production request will ever hit a GC-ing JVM.
Personally found Java's GC to be a little tricky but generally awesome.
However, 100, even 200ms pauses get "lost" during network coms.
But users tend to notice if a page takes 30 seconds to load rather than 1 second (differences i've seen optimising Javas GC, don't know at all how Go and Java compare).
That kind of difference would be very expensive to solve using more servers.
That's the key trade of these days in software imho.
What kind of memory management you need.
Sever side zero memory leaks absolute requirement and "real time" responses rarely a requirement. triggering gc during init and/or shutdown of a component often enough.
Building a CNC machine - every tick is valuable, but as long as it can run a job for 2 days before crashing no one will notice if it leaks memory like a sieve when you run a calibration routine.
From my knowledge G1 is for low latency by sacrificing some throughput. I do not know why you keep hammering a point what Oracle official documents do not claim.
1. If (a) peak application performance is the first priority and (b) there are no pause time requirements or pauses of 1 second or longer are acceptable, then let the VM select the collector, or select the parallel collector with -XX:+UseParallelGC.
2. If response time is more important than overall throughput and garbage collection pauses must be kept shorter than approximately 1 second, then select the concurrent collector with -XX:+UseConcMarkSweepGC or -XX:+UseG1GC.
That's relative to other Hotspot collectors. It's still generational with bump-pointer allocation and lighter write barriers than Go, so it is still geared heavily towards throughput relative to Go's GC.
To stop the world, Go preempts at the next stack size check [1], which happens during function call preludes. I guess that happens often enough to be fast in practice.
I assume this works so quickly in part because goroutines are backed by a pool of OS threads that don't block? So everybody doesn't get stuck waiting for a thread that's blocked to get around to checking the preemption flag?
100 microseconds is quite a long time in CPU time for a single-core these days, and proportionally longer with multi-core, or say in GPU time. However taking into account the VM runtime environment, this wouldn't make Go's feat any less impressive.
I like faster latency as much as the next person, but these improvements aren't free. On my own servers I notice approximately 20% of CPU time spent in GC.
I'm a C++ programmer who hates garbage collection on general principle, but I have to say: 20% of CPU time spent in memory management isn't all that unusual (especially in software that hasn't been optimized), whether or not that memory management is automatic garbage collection or manual.
Quite true, but the reality is that since Modula-3, Oberon, Eiffel and others did not get adopted at large by the industry, many got the idea that it isn't possible to have a GC for productivity and still manually manage memory, if really required.
So now we are stuck with modern languages having to bring those ideas back to life.
In the code I write where it matters the issue is allocation & deallocation time. Thus you don't do those things on the hot path in either gc'd or manual memory management environments.
Given that the overhead become 0 in either.
Is it sometimes harder to write zero alloc code in GC'd languages? Sure but its not impossible.
In Java for instance the difference in performance compared to C++ comes from memory layout ability not memory management.
> In Java for instance the difference in performance compared to C++ comes from memory layout ability not memory management.
We need more modern languages with AOT compilation, ability to allocate on the stack and use value types.
I like C++ and have spent quite a few years using it, before settling on memory safe languages for my work.
Many C++ presentations seem to forget on purpose that there are memory safe programming languages, which offer similar capabilities for controlling memory layouts, thus presenting the language as the only capable of doing it.
Modula-3 is an example, or Swift and D for something more actual.
The reality is that there aren't any straight replacements though since C++ has such mature tools, move semantics take away significant amounts of memory management pain, D without a gc is still exotic, and rust is still in its early stages.
It is a matter of which market is relevant for a developer.
There are many use cases where C++ is still used where the use case at hand actually didn't any a real need for it.
For example, on my area of work, C++ has been out of the picture since around 2006. We only use it when Java or .NET need an helping hand, which happens very seldom.
On the mobile OSes for example, C++ is only relevant as possible language to write portable code across platforms, but not so much for those that decide to focus on just one.
There Swift, Java and C# are much better options.
For those in HPC, languages like Chapel and X10 are gaining adepts.
Also as an C++ early adopter (1993) I remember being told by C developers something similar to what you are saying in regards to tool maturity.
Now around 30 years later, their compilers are written in C++.
I'm not trying to claim C++ is the only language anyone will ever need. I've tried hard to find alternatives but until the D community really goes full force on getting the garbage collection out and starts to care about the tools for the language instead of just the language, it seems like rust will be the only contender (and future C++ and maybe even jai). I wish a language called clay had taken off, it was pretty well put together as a better C.
I'll mention a significant case that doesn't have to do with allocation. Large graph-like data structures (lots of small nodes linked by reference) are effectively prohibited entirely by the GC. They make every marking phase take much too long, until whenever time the whole thing gets promoted into the long-lived generation. A mainstream runtime having such an impactful opinion about my data structures (not alloc patterns) is something I just find deeply offensive. Imagine if C++ programs stuttered because you had too many pointers!
They could have avoided that whole problem by providing an API like DontCollectThisRoot, but instead they (and a lot of programmers) chose to pretend the GC was good enough to treat like a black box.
Huh? Are you talking about a particular GC? Because every object-oriented program I've ever seen could be described as "Large graph-like data structures (lots of small nodes linked by reference)".
Any GC that walks the graph, and isn't mostly concurrent. You will know when the graph becomes complex enough, because thereafter the GC will not let you ignore it. In my experience, as few as several hundred thousand objects can become a problem. Imagine trying to write a responsive 3D modeling app with vertices, edges, and faces all bidirectionally pointing to each other. You the programmer would think very carefully before fully traversing such a data structure (much of the point of having thorough internal references is avoiding doing much traversal!), and yet the runtime helpfully does it for you, automatically, and there's nothing you can do to stop it.
FWIW, Go has value types, so there's less referencing than in Java, etc. Also worth noting that these are actually used unlike in C# which has a reference-type-first culture.
> If it's a system-enforced GC, you are limited in what you can do.
Perhaps I'm misunderstanding, but do many C programmers understand not only the current state of malloc at any given moment in their code but exactly how it works?
I think not.
A lot of the things you do in C++ to reduce memory management overhead are the same things you can do in Java, C#, and Go to reduce memory management overhead. That effort is neither special nor universal.
HLLs often have to be careful about using language features that unexpectedly create garbage, but in terms of actual management and collection it's not like ANY competent modern language is slow at it.
People often seem to neglect the fact that Java is still surprisingly fast despite spending lots of time on memory management just because many developers are so insensitive to how things alloc. Modern GC systems can manage allocs as well as deallocs, so with care from the programmer and the runtime authors you can reach the kind of performance people talk about as critical for "embedded systems" (even though in practice SO many things do not deliver on this promise in shipped products!).
> Perhaps I'm misunderstanding, but do many C programmers understand not only the current state of malloc at any given moment in their code but exactly how it works?
Good programmers understand how malloc works. What, are you kidding, or am I misunderstanding?
Performance-oriented programmers do not use malloc very much. As you say, you can also try to avoid allocations in GC'd languages. The difference is that in a language like C you are actually in control of what happens. In a language that magically makes memory things happen, you can reduce allocations, but not in a particularly precise way -- you're following heuristics, but how do you know you got everything? Okay, you reduced your GC pause time and frequency, but how do you know GC pauses aren't still going to happen? Doesn't that depend on implementation details that are out of your control?
> even though in practice SO many things do not deliver on this promise in shipped products!
But, "in practice" is the thing that actually matters. Lots and lots of stuff is great according to someone's theory.
> The difference is that in a language like C you are actually in control of what happens. In a language that magically makes memory things happen, you can reduce allocations, but not in a particularly precise way -- you're following heuristics, but how do you know you got everything?
First of all, it's not like most mallocs DON'T have heuristics they're following. Without insight into what it wants to do it is equally as opaque to how Java or the CLR manages memory.
And your behavior absolutely can and does influence how much things are allocated, deallocated, and reused. If you think that the JVM cannot be tuned to that level, you're dead wrong and I can point to numerous projects written for virtual machines that reach levels of performance that are genuinely difficult to reach no matter your approach.
> Good programmers understand how malloc works.
"Good programmers know how their GC works. What, are you kidding, or am I misunderstanding?"
> But, "in practice" is the thing that actually matters.
"In practice" Kafka is the gold standard of pushing bits through distributed systems as fast as possible. "In practice" distributed concurrency systems (that are often the speed limit of anything you want to build on more than one computer, e.g., etcd, Zookeeper, Consul) are I/O limited long before their collected nature impacts their performance.
And if we can eventually liberate ourselves from operating systems that give priviledged status to C and C++, that ecosystem will diminish further because its performance benefits come at too high a cost, and are generally oversold anyways.
> "Good programmers know how their GC works. What, are you kidding, or am I misunderstanding?"
I think you are not understanding what I am saying.
You link your allocators into your code so you know what they are. You see the source code. You know exactly what they do. If you don't like exactly what they do, you change them to something different.
A garbage-collector, in almost all language systems, is a property of the runtime system. Its behavior depends on what particular platform you are running on. Even 'minor' point updates can substantially change the performance-related behavior of your program. Thus you are not really in control.
As for your other examples, apparently you're a web programmer (?) and in my experience it's just not very easy for me to communicate with web people about issues of software quality, responsiveness, etc, because they have completely different standards of what is "acceptable" or "good" (standards that I think are absurdly low, but it is what it is).
> You link your allocators into your code so you know what they are. You see the source code. You know exactly what they do. If you don't like exactly what they do, you change them to something different.
In my experience, most C/C++ devs know what malloc/free or new/delete does, but how? They don't care as long as it works and doesn't get in their way. Sure in larger applications, the allocator/deallocator can consume quite some time - but even then it rarely is the bottleneck.
I happen to have a more hands-on experience with allocators, I had to port one a long time ago, but in C or C++, I rarely knew how the one I was using was implemented (except for the one I ported). Seeing the source code? Sorry, that's not always available and even if it is, not too accessible - not that many devs actually ever look into the glibc code...
And linking your allocator? Most of the times you just use the default-one provided by your standard library - so that happens 'automagically' without most developers realizing this. I yet have to see a modern C or C++ app that specifically has to link it's own allocator before it could actually allocate memory. Most compilers take care of this.
For most stuff I do - I like gc's. In most real-world situations, they are rarely the bottleneck, most applications are I/O bound. For most stuff, a GC's benefits outweigh it's disadvantages by a huge margin. And if the gc could be become a bottleneck, you should have been aware of that up front, and maybe avoid something using a GC, although I'm not a fan of premature optimization.
Embedded systems in most cases have memory constraints and GC languages are memory hogs. The cost of having GC is that you pay with higher memory usage for doing GC in batches thanks to which you do not have to pay for single deallocations. So this performance advantage cannot be used in embedded space because there is no free memory for it, you would need to GC all the time which would kill the performance.
> The cost of having GC is that you pay with higher memory usage for doing GC in batches thanks to which you do not have to pay for single deallocations. So this performance advantage cannot be used in embedded space because there is no free memory for it, you would need to GC all the time which would kill the performance.
The same is true for C. You don't get to make frequent allocations for free in any language. You have to trade space for performance; in GC-land, the answer is tuning the collector. In Go, this is one knob: the GOGC env var.
I've read a lot of really cool papers on GCs that avoid this. The bigger problems arise from languages that take for granted that any abstraction they offer with a memory cost is okay because it offers the user no additional complexity.
For example, languages that use closures have to have very smart compilers or even innocuous functions can create implications for the working set size, which puts the allocator and deallocater under a lot more pressure.
And that's not even the most subtle problem you might run into! A common cause of memory constraints in Java 1.7 and earlier stemmed from subarrays of large arrays. Java made a fairly innocuous decision regarding methods like String.substring that ends up biting a lot of people later on, even as it is the right decision for a slightly different set of performance considerations.
That's true for GC vs. manual heap-based memory management, but most GC languages don't do stack allocation at all or only for primitive types and stack allocation is much, much faster than any sort of heap-based memory management.
Java, Go, and C# all do stack allocation either implicitly via escape analysis (Java), explicitly via value types (C#), or both (Go). I don't know that this is "most" (perhaps by marketshare), but these are certainly 3 of the most popular languages in this space.
Go has excellent auto-completion support, even for vim. It's debugger (delve) is also decent, though not graphical. There are not many languages with better tooling than Go, in my experience.
FYI, delve is integrated into a lot of editors, and basically works exactly like visual studio when used in VS Code (I used visual studio for C++ & C# for 13 years before moving to Go).
"stack allocation is much, much faster than any sort of heap-based memory management"
No, it's not. For short-lived objects, at least on the JVM, allocation is a pointer bump, and collection is free, because unreachable objects are not even walked. Stack allocation doesn't beat that by much.
The allocation speed will be close; a bit worse since the pointer to the end of the current allocation buffer usually isn't in a register and a function call and range check is required. However the overall cost of handling that memory from start to finish is significantly higher than the stack even if it gets clobbered in the first nursery collection.
Not because it is very high, but because stack allocation/deallocation is so very simple.
That's fair. Does Go do this? Or any other somewhat mainstream language? Any thoughts on how arenas (rust) compare to gc and manual allocation for speed?
Arenas trade some granularity and flexibility for speed and fragmentation-free allocation; they're a great choice for bulk data that you can precompute the size of and want to iterate over very quickly, and they're also easy to reason about. You can do many tasks using only arenas and stack allocations, and it'll zip along very quickly. If needed, you can flag liveness to get very efficient reuse of short-lived objects. They're less ideal if you are gradually adding more data over time, and you want to retain a consistently low usage, since you end up with "stairsteps" of latency when it copies into a bigger buffer, and once you have the big buffer, it's quite complicated to shrink it down again.
malloc()-style allocation gives you precision over how much you want, and this is most interesting to a memory-constrained system trying to allocate amongst many different processes(the original Unix use-case). But willy-nilly use of malloc() and free() leaves you with lots of fragmentation, as well as a larger attack surface for memory errors. What the actual allocation algorithm does is out of your hands, too, at least if you're depending on the OS allocator(you can always go write your own and supply it with a giant heap to munch on, and this may occur when you need tuning for individual allocation scenarios).
In the right scenario, a GC won't do too much differently from a manual allocator(there are lots of optimizations that could bring the allocation to same-or-negligible runtime), but as we all know, right scenarios are something you can't always count on. A GC does, however, greatly simplify the memory management of a long-lived process since it can do nifty things like automatically compact the heap.
IME, a mix of stack, GC, and some arenas in the form of growable arrays, is absolutely fine for the needs of most applications. Quite often this last requirement creates a sticking point, though, where the language disallows value semantics for arrays of objects, and then you can no longer assume they're allocated linearly, plus the GC is taxed with additional references to trace. In those cases, if I have them I use arrays of primitive values as crude containers for an equivalent optimization. Either way, it's very annoying and creates some awful code, because those huge batches that I'd like to put in arenas also tend to be the bottleneck for the GC.
I'm of the impression that Java has done escape analysis for a while now. They just haven't had value types, which as I understand, just introduce a semantic for stack allocation.
No, he means a single CPU instruction. That's not quite fair, I don't think it actually is a single instruction, more like a few instructions in the best case and a very large number of instructions on the slow path.
The tradeoff here seems to be a more complicated write barrier, so the loss in performance here will for the most part not show up as time spent in the GC. I'm curious to see how big of an issue this will be; the only GC I've heard of with such a heavy write barrier is OCaml's, which is ameliorated by the language's disposition towards immutability.
And OCaml has a generational GC, unlike Go. So Go's throughput is going to be hit even harder.
Not going with a generational GC is a choice that runs against what has been accepted practice for decades. It may of course work out, but it's what I'd consider an experiment.
Typical Go code does not generate a lot of short-lived objects, compared with, say, Java or with typical usage of persistent data structures in functional languages.
That removes the practical need for generational GC.
I see the importance of escspe analysis as another indication that tyical Go code does not generate a lot of short-lived objects on the heap. It is just the language does not allow to express particular stack-allocation idioms requiring the compiler to infer them.
Compare that with Java where rather sofisticated escape analysis does not help much besides allowing to return several things cheaply from a function. Typical code there just does not follow stack-like allocation patterns.
That might imply that most allocations are happening on the heap - if your code is structured to make allocations only on the stack as much as possible, there wouldn't be that much work to do.
I would 10,000x rather have to match new with delete (big deal) than to maintain the revolting unidiomatic contortions I'm obligated to do to outwit the GC.
See the "SustainedLowLatency" mode for something very similar to what Go does (although .NET's GC, unlike Go's GC, is generational, which is a significant difference).
The problem is that Unity doesn't use a modern .NET runtime at all, rather the frozen Mono version from back when the Mono developers where still working for SuSE.
Unfortunately, Unity still uses .NET 2, and their next planned upgraded (to 4.6) is still listed as an undetermined eta. That's good to know for the future though.
Depends on the platform you are targeting. Mono's GC (last I heard, they may have integrated .NET Core's by now) is relatively primitive. It was mostly developed just to have something better than Boehm.
Their experience was actually pretty instructive; with their first release of the new GC (a proficient but basic generational collector) they still weren't beating Boehm all the time, and usually weren't beating it by much. Given its constraints, Boehm's GC is impressively optimized when you run it in semi-precise mode.
Very interesting, I'm interested in how you came to this understanding. How did acquire information about this? Did they blog about it, or do you happen to follow their mailing lists, or ... ?
I read a blog post from someone at Xamarin with some test cases graphed, a bit before Xamarin started suggesting SGEN as the default. I'll see if I can find the one I'm thinking of.
Yes, they even used TLABs. I believe the issue was that Xamarin was more interested in mobile performance at the time when SGEN was seeing heavy development and preparing for release, so they optimized for memory usage instead of throughput. The generational part was probably more of a hindrance than a benefit at that point in the process.
AAA game developers only moved from Assembly to C, Pascal when forced to do it.
A few years after they moved from C to C++ when the SDKs started to be C++ only.
Similarly they will only move from C++ when forced to do so, and they only major complaint is related to build times.
I bet most are willing to wait until C++20 for modules than switching to another language, even if some studios manage to have some hits developed in safer languages.
Very good point here The outlook of gamedevs to new tooling tends to be pessimistic because so little is actually geared towards what they work on, or want to work on.
On the other hand, Web and mobile games have lived with the consequences of a managed runtime for many years now. There are limits in how much processing power is available there, but it ultimately just diverts developer attention towards other things like a more robust backend, faster turnarounds, and other general workflow improvements independent of scene fidelity.
You really want a higher-level API in most cases, and these are almost exclusively written in C/C++. Also in Go, there's a certain overhead when calling C functions.
If you are measuring response in nanoseconds, 100 microseconds is still a lot.
However, it may be good enough for games at well below 1% of your time budget for a 60 fps game assuming cache locality is good enough so you don't waste too much time fetching from main memory.
Besides games it would be interesting how good Go would now work for things like low-latency audio processing (single-digit-millisecond-latency). That's some kind of classic domain where performance is not a problem but once you miss the target timeframe you are pretty fucked up (producing and hearing glitches).
I write a fair amount of low latency code in go, but none of it is hard real time. Average throughput of a few microseconds (for my simple workloads) with spikes of milliseconds here and there is what I tend to find. Fine for a lot of things but I'd be hesitant to use it for high fi audio apps that are very sensitive to latency. Humans may only be able to notice 50ms, but if you're chaining DSP you can end up with a fair bit of variance in your processing pipeline.
Already in 2012, so even if not yet there, I think Go can get there, specially since you can take care to just stack allocate and minimise GC use during the hot tub path.
Go can become a systems programming language, even if the GC haters don't think so.
It has the same features as Oberon for systems, programming which was used to build quite a few Workstation OS at Swiss Federal Institute of Technology in Zurich (ETHZ), by Niklaus Wirth.
The only thing missing from Go versus what Oberon offered is register access on the unsafe package, but even then can be sorted out with an extension or a few calls into Assembly.
Oberon-07, which is even more minimalist that either Go or the original Oberon is sold by Astrobe for bare metal programming on cortex M boards.
Personally, I would say latency isn't the problem, and that not being able to avoid the gc makes Go automatically not a systems programming language. Not all situations can afford a gc, and really for a lot of systems programming usage (Particularly embedded systems) using a gc just obscures memory usage when you in a lot of cases the memory can be declared statically to begin with (Ensuring a maximum amount of memory usage).
That's not to say you couldn't potentially write an OS or an embedded system in Go (I mean, you can write OS's in Lisp if you really want) but I doubt it would be fun and I doubt anybody would recommend it. You definitely won't be writing idiomatic Go without a lot of extra pieces that you can't really afford in those situations.
Most other sophisticated GCs (e.g. .NET and Java's) can obtain pause times in a similar range for generation 0 collections. So if GC is the reason you don't want to use one of those you'd probably be more interested in improvements in worst case pause times. Go is however very good at avoiding unnecessary object allocation and doesn't need a JIT so it may still be closer to what you need than those languages.
Right, and tenured generation collections are already concurrent in HotSpot and .NET. Generally only heap compaction needs to stop the world; you can disable compaction if this is a problem for you.
What the Go developers consider a "systems programming language" was exactly explained in the very first announcement/presentation of Go. They clearly outlined that it's for building systems like Google's, not operating systems.
> .1 of a millisecond is very impressive. Does this mean Go can finally be considered a systems programming language?
Can you write an OS kernel in Go ? no Go's runtime still depends on a OS. And whoever talked about Go as a system language didn't have kernels in mind, but "network infrastructure".
Go needs a runtime, but it need not be a full OS; it could be a not too big library that gets linked to a kernel written in go.
Similarly, can you write an OS kernel in ISO C? No, you still need some assembly or runtime support. For example, ISO C doesn't have any notion of making syscalls or returning from the kernel to a caller or for setting up page tables.
Any argument why go isn't suitable for systems programming along these lines should be about how much and what kind of runtime is acceptable for a systems programming language.
A (fairly popular, I think, but certainly not universally agreed upon) argument could be that systems programming languages cannot have garbage collection because it should be possible to implement a garbage collector in them, and doing that in a language that already has one is possible but silly.
I can't find it in the documentation, but I would think that must be implemented in ProtoGo, where ProtoGo is a Go-like language that doesn't use Go's garbage collector or a library that does (Enforcing not using anything that uses the garbage collector may be left to the programmer)
That is necessary even with a concurrent garbage collector because a garbage collector that allocates in its own heap may hang itself (propaganda allocates; gc triggered; gc allocates; gc triggered to satisfy the allocation; new gc triggered; etc.) . Or do Go's developers accept this risk and live with it?
> Enforcing not using anything that uses the garbage collector may be left to the programmer
It's left to the compiler actually. Programmers can't be trusted.
The runtime does not implicitly generate garbage (like arbitrary Go code). It is compiled with a mode that fails compilation if something can't be stack-allocated. When heap allocation is necessary, it is requested manually. However, the memory returned is garbage collected, as usual, there is no free.
Besides 4ad reply, here you can see how Oberon has its GC implemented in Oberon, as another example of a bootstrapped GC enabled systems programming language.
The Go developers provided their definition, which is something people conveniently ignore whenever they try to be clever about Go not being a systems programming language.
So many places I worked had systems engineering department and none of them had anything to do with operating systems/ device drivers. I wonder whether it has to do with OS hackers hanging on internet together and deciding what systems would mean.
Go 1.8 is moving to a hybrid write barrier in order to eliminate stack rescanning. The hybrid write barrier is equivalent to the "double write barrier" used in the adaptation of Metronome used in the IBM real-time Java implementation.
> What about the way Go handles errors today makes them not "strongly typed"?
Error as type or error as values ? the std lib promotes error as values (i.e. check equality) instead of errors as type (i.e. check the type). Go error system WAS written with errors as value in mind. There is no point having errors in place of exceptions if errors were intended to be used as types (which they are not, as said previoulsy). Basically developers are implementing their own mediocre exception system on top of Go errors.
The error as value thing made sense in C 30 years ago, it doesn't in a language created less than 10 years ago.
There are a lot of C inspired patterns in Go that make the language half modern/ half dated in strange ways. That's fine when one comes from C though, that isn't when one comes from anything remotely modern. But I guess it's why Go is successful, it's basically C with garbage collection.
I said that already. And that's not the problem at end. When you test an error, do you test against a value or a type in order to know what kind of error it is ?
> You can test the interface. A type is just an interface around memory, albeit more consrained.
Wow, again, that's not the problem here. Errors in the standard libraries are defined at values. There is no point testing it as interfaces, it will not give you the nature of the error, since they are all defined with fmt.Errorf . Do you understand now the problem ? the problem is being consistant across codebases between errors are values and errors as types.
That's because single error instances are much cheaper than always creating a new instance of a given type. No need to create garbage for lots of common errors. Of course you could have a dedicated type plus a singleton implementation of it in Go. But what would be the advantage? Checking if err.(type) == io.EofType does not give you more information the only checking if err == io.Eof, as long as you don't store any error instance related information in it. Which makes sense for custom errors and which Go absolutely allows you to do.
> No, an error implements the error interface. It means that it can be a value of any type that implements the constraint of having an Error method.
It doesn't matter what a value implements if you don't test its content by type. Std lib errors are meant to be tested by value, not by type. It has nothing to do with interfaces again. When you get a std lib error in order to know what error it is you compare it to an exported value, not its type. I don't know why you keep on insisting on interfaces that's not the core of the issue here.
Neither the Java std lib or the .NET std lib do that though, they don't declare an exception as a static constant you need to compare with. Because it's (rightfully so) considered bad coding. Exceptions give you some valuable informations like stack traces. Go errors gives you nothing. They are inferior to Exceptions and a relic of C programming patterns.
The compiler provides you no help at all with them, and no syntax that makes error conditions and handling separate. It also mixes application logic and recovery logic.
Basically everything that's problematic with returning a status int in C, but all new, hip, and backed up by Rob Pike's pseudointellectual bullshit and a bunch of Silicon Valley 20somethings.
They could at least, you know, have an Either type or something. Anything?
>It also mixes application logic and recovery logic.
When did this separation become law ? What if the "application logic" requires recovery ?
>They could at least, you know, have an Either type or something
(int64, error) in func ParseInt() (int64, error) is your Either type. And checking if you got the "left or the right side of the Either" is IMHO much shorter and clearer than in Scala.
The convention is that if the err == nil then the value is not nil. The exceptions to this rule are very few and usually specified in the documentation. Normally you only have to check for error.
It's not that it's a bad idea, just that because of the "nil interface" absurdity it can happen if you accidentally mix concretely typed variables and interfaces, as in the example.
This is perfectly valid and doesn't cause the nil issue:
return someErrorStruct{}
...where someErrorStruct is a strict that implements the "error" interface. Using structs for errors is fine, and is in fact generally preferable to singletons like io.EOF, which can (by their very nature) never be extended with more data about the error.
There's nothing absurd about it; interfaces are a reference type. If you have a reference to a reference, then checking the "outer" reference for nil doesn't tell you anything about the nullity of the "inner" reference. The advice is just a special case of "don't needlessly use pointers-to-pointers".
Go chose to rely heavily on nil pointers, which is a design mistake (see Tony Hoare's apology for inventing it). The resultant tension between interfaces and nils is, in my opinion, an absurd side effect that cannot be explained away as anything except an ugly wart. We should have something better than this in 2016.
I say this as someone who uses Go daily for my work and mostly likes it (despite the many warts).
I don't especially love nil either, but people make too big a deal of it. The only arguments against it are hypothetical scenarios, anecdotal examples, and appeals to Hoare's authority. While there's probably a more ergonomic design, there's no substantial evidence that nil is the catastrophe it's made out to be. Using terms like "absurdity" and "catastrophe" seems overly dramatic without some decent evidence.
I don't think I'm being overly dramatic, actually. I deal with nil oversights on a daily basis during development, and I would say that it is is the main source of unexpected crashes in anything. It equally applies to other languages such as Ruby and Python.
It's exacerbated by the fact that Go chose to make things like maps, slices and channels pointers, too. It has an excellent philosophy about zero values (so you can't get nil strings, even though they are pointers internally), yet goofed when it came to something as intuitive as "var foo []string", and claimed this behaviour to be a feature. The (nil, nil) interface is just icing on a crappy cake.
The fact that such a new language as Go doesn't have a way to express missing values safely should be disappointing to developers.
By a wide margin, the biggest production issues I see are index out of bounds or key errors (regardless of language). When I'm treating `nil` as a valid state for some object, I take extra care to test its permutations, and uninitialized maps/interfaces/etc are quickly discovered during testing (every test has to initialize, so this logic is well-covered).
> The (nil, nil) interface is just icing on a crappy cake.
The same problem exists with languages without nil. For example, if you choose to do the stupid thing and return Option<Option<error>> when you only need Option<error>, then checkout the outer Option<> for None is not going to magically guarantee that the inner Option<> is not None.
> It has an excellent philosophy about zero values (so you can't get nil strings, even though they are pointers internally), yet goofed when it came to something as intuitive as "var foo []string", and claimed this behaviour to be a feature.
What are you talking about? nil slices are every bit as safe as an empty slice or an empty string (which is just an immutable empty slice).
> The fact that such a new language as Go doesn't have a way to express missing values safely should be disappointing to developers.
I agree, but I'm mildly annoyed by it, but as it's the least of all of my problems, I find words like "absurdity" to be too heavy-handed.
Nil slices do cause problems. One when it's an aliased type that satisfies an interface (again with the nil interfaces!). Another is that it leads to inconsistencies: json.Marshal(([]string)nil) returns "null", for example, not "[]". Yet another annoyance caused by nils (including nil slice) is that reflect.Value becomes a lot more byzantine than it ought to have been, requiring careful use of IsValid(), Kind checks etc. when you want to deal with something that is either an interface or something else.
As for Option: Not sure how that's an argument. And anyway, a language with real sum types will never allow going down a branch that doesn't receive a correctly matched value.
> One when it's an aliased type that satisfies an interface (again with the nil interfaces!)
It sounds like you're again confusing nil interfaces with an interface holding a nil value (in particular, there is no way to get a nil interface from a nil slice). Here's an example that demonstrates nil slices do not cause problems: https://play.golang.org/p/tSA_otqg3-
> Another is that it leads to inconsistencies: json.Marshal(([]string)nil) returns "null", for example, not "[]".
1. This is unrelated to the language; it's the behavior implemented by the JSON library
2. This behavior is correct; a nil slice is not the same as an empty slice:
You posited that Go's nils are bad because you can satisfy an error interface with a pointer type, and then when you return (*ConcreteType)(nil), your error handling code executes. The problem here is unrelated to nil or to interfaces; the problem is that you're nesting one binary type inside another (binary in the literal sense of having two states, either nil or not nil in this case). You would have the same problem in Rust had you done Result<Foo, Option<ConcreteError>> or (in the case of a single return value) Option<Option<ConcreteError>>. You would fix the problem in either languages by recognizing that you only have 2 states (error or not error) and removing the nesting (in Rust: Result<Foo, Error> or Option<Error>; in Go `return ConcreteType{}`).
> And anyway, a language with real sum types will never allow going down a branch that doesn't receive a correctly matched value.
I agree, and it would be nice if Go had this, but this is also not a very significant problem--this problem is blown way out of proportion.