Sub-millisecond GC pauses in Go 1.8

enneff · on Oct 29, 2016

The proposal document for this change is a good read: https://github.com/golang/proposal/blob/master/design/17503-...

4ad · on Oct 29, 2016

The canonical URL is: https://golang.org/design/17503-eliminate-rescan

gok · on Oct 29, 2016

"typical worst-case"

So... not worst-case. :)

Regardless, very impressive achievement, especially given that they've been able to do it without a JIT or exotic tricks like Azul.

dpark · on Oct 29, 2016

Of course not. You can't put absolute worst case times on code in practice. Worst case might be that someone managed to load 80PB of RAM on a CPU underclocked to 50 MHz. Or that the OS suspended the process in the middle of GC and was promptly frozen for a week while VMs were migrated across the country in the cup holder of someone's Toyota.

Worst case in absolute time always requires ignoring pathological cases.

taeric · on Oct 29, 2016

This feels very wrong to me. Qualifying it with some sort of analysis is fine. Just saying "typical" is borderline negligent and is why nobody just uses naive quicksort in production systems where runtime can actually matter.

klodolph · on Oct 29, 2016

What's the alternative?

We don't care about worst-case latency in practice, and average case is often ignored as well. We look at 99.9% latency numbers or things like that. Worst-case is for people designing pacemakers or rockets, that's not what we are doing here.

kuschku · on Oct 29, 2016

> that's not what we are doing here.

Considering this code will likely be used in IoT, yes, this is what we’re doing here.

grogenaut · on Oct 29, 2016

Pretty sure web enabled cameras are NOT pacemakers NOR rockets... heck they can't even put a different password on each device by printing on a sticker.

Nor is my sous vide machine or anything else I've seen as an IoT device. I wouldn't call say my oscilliscope an IoT device even if it can connect to wifi. It's a scientific device with cloud upload.

kuschku · on Oct 29, 2016

You do realize pacemakers nowadays have WiFi and Bluetooth and run on the same platform as other IoT devices?

There was an interesting talk at the Chaos Communications Congress a few years back.

grogenaut · on Oct 29, 2016

I'm well aware of this. I'm also aware that people are hacking these devices left and right. But a cow is a mammal and a mammal isn't a cow.

Gp post was not talking about security we're talking about Real time computing. If it's running on IoT hardware, eg rpi running linux, it's likely not doing real time os stuff because linux isn't really a real time os.

And I still don't classify pacemakers as IoT. Just because you want to slap a vague acyronym on any and every "connected" embedded device doesn't mean I agree.

kuschku · on Oct 29, 2016

But that's the thing, there's now a trial for implanted, smart, insulin pumps, running on node.js, connecting to its app via WiFi.

That stuff is IoT in every definition of the word.

kpil · on Oct 30, 2016

I know a few software developers that have diabetes, and they seem to be reluctant to automatic pumps.

Monitoring might be fine, but installing something in the body that can kill you and then running it on node.js doesn't seem like a good idea.

voidlogic · on Oct 29, 2016

>Just saying "typical" is borderline negligent

Pretty harsh for an informal post on a mailing list, its not like this was a formal proposal or an announcement.

Talking about why "typical" lacks meaning is fair, but calling the OP negligent is not.

taeric · on Oct 29, 2016

Completely fair point and one I should have acknowledged. I was not intending to call the OP negligent. From what I saw, they did a full analysis and are preparing the full numbers. I was just irked by this idea that "typical" is all one needs.

dr_zoidberg · on Oct 29, 2016

It depends on how it's been determined that it's the "typical worst case"

...maybe an anonymous usage program?

...maybe their typical use case? (which is definitely not everyone elses)

...maybe the typical use case of Google employees using Go for their single-computer tasks? (which would be closer to a "typical user" as in people who read HN)

...an average of those three, or perhaps even more, use cases? wighted in some way?

But if done correctly, optimizing the 90% typical problem by 20%, gives you a general 18% improvement, far more than even making disappear the least important 10% of the runtime (which some bad optimization efforts tend to focus on...).

Making software faster is always about knowing where to work, and then focusing exactly on that. Nothing wrong in attacking the "typical [anything] case".

nickpsecurity · on Oct 29, 2016

Sure you can. It's standard in embedded software. Called WCET analysis. There's also real-time GC's. Go team has simply not done either of these in their design. So, they can't give a worst-time estimate.

countingteeth · on Oct 30, 2016

Indeed. If there's no worst case time, that eliminates use in hard-real-time, certifiable embedded, e.g. aerospace, medical, etc.

You can't very well tolerate an "atypical" GC pause when you're firing a laser or radiation pulse, doing engine control, or even updating a display in a flight instrument setting.

eternalban · on Oct 29, 2016

What does JIT have to do with GC?

gok · on Oct 29, 2016

A JIT can do better escape analysis, eliminating allocations. A JIT can also be more clever about safe point insertion. And when a JIT detects that code isn't being called concurrently, it can emit much cheaper read/write barriers.

kodfodrasz · on Oct 29, 2016

This is what amortized analysis is about.

countingteeth · on Oct 30, 2016

You can't amortize in real-time. You don't get a mulligan for missing a deadline because you can do twice as much work in the next frame.

lomnakkus · on Oct 31, 2016

Wait, what?

In my, admittedly maybe lacking, understanding amortization doesn't really work that way. "Amortized worst case" (for example) means that you can still bound the worst case[1], but it's just not necessarily going to be a very accurate bound. Obviously, amortized complexity doesn't tell you "<X ms" right off the bat since it deals in abstract "operations", but if you have known worst case bounds for all the "operations", then an amortized bound for a given operation will give you something equivalent to "<X ms".

[1] I mean, it's a common proof technique to actually have a non-negative cost function (+ invariant) and postulate/derive an upper bound on that, so... what gives? What's your reasoning here?

countingteeth · on Nov 1, 2016

No. The entire premise of amortized analysis is to get a more optimistic "eventually O" number. "Eventually" is not good enough for real time. Yes you can get a real hard worst-case number, but that's a different analysis from amortized analysis. Unless all of your amortized operations are happening between deadlines, it's useless--worse, dangerous--for safety. And amortized analysis is almost never used that way. You don't have language run-times that reinitialize between every deadline.

lomnakkus · on Nov 3, 2016

You're right.

I somehow confused myself by thinking of it in terms of one of the proof techniques for amortized worst case where you derive a fixed upper bound for any "n". Of course this is a much stronger property than needed.

rayiner · on Oct 29, 2016

Very impressive. I wonder how Go manages to even stop all threads in 100 microseconds, much less do any work in that time.

johncolanduoni · on Oct 29, 2016

Other systems I'm aware of that are capable of similar feats (I know .NET's collector can, not sure about Hotspot's) use page poisoning. Basically the compiler/JIT puts a bunch of extraneous memory accesses to a known address at key points where the GC can track what all variables and registers hold (called "safepoints"), in a fairly liberal pattern around the code. When the GC wants to stop all the threads in their tracks, it unmaps the page holding that address, causing segfaults in every thread it it hits one of the safepoints. The GC hooks the segfault handler, and simply waits for all threads to stop.

I'm not sure that's what Go does, but I'm not aware of a faster way on standard hardware.

pcwalton · on Oct 29, 2016

That's basically what Go does; it folds the preemption points into the stack checks during function prologs.

menzoic · on Oct 29, 2016

Where did you learn that? (curious to learn more)

ci5er · on Oct 29, 2016

I haven't seen it written up, but it's discussed some here:

  - https://github.com/golang/go/issues/10958

hinkley · on Oct 29, 2016

hopefully in back branches too, right?

pcwalton · on Oct 29, 2016

Not according to this bug. :( https://github.com/golang/go/issues/10958

geodel · on Oct 29, 2016

Java 7 and above provide XX:MaxGCPauseMillis flag for GC conf

From Mechanical Sympathy Blog [1]

" G1 is target driven on latency –XX:MaxGCPauseMillis=<n>, default value = 200ms. The target will influence the amount of work done on each cycle on a best-efforts only basis. Setting targets in tens of milliseconds is mostly futile, and as of this writing targeting tens of milliseconds has not been a focus of G1. "

So in Java world we are talking 100s of milli seconds of worst case which is 3 order of magnitude higher than Go.

1. http://mechanical-sympathy.blogspot.com/2013/07/java-garbage...

pcwalton · on Oct 29, 2016

Well, yeah. You're comparing the G1 (for server workloads) to Go's concurrent collector. There's a concurrent collector you can use in incremental mode for the JVM if you want to trade throughput for pause time, like Go does.

The HotSpot designers have (correctly, IMO) observed that server use cases typically prefer throughput to latency.

On a meta note, I wish people would focus on the actual algorithms instead of taking cited performance numbers out of context and comparing them head to head. The Go GC uses the same concurrent marking and sweeping techniques that CMS does. This change to avoid stop the world stack sweeps is something that no HotSpot collector does. But it's counterbalanced by the fact that Go's GC is not generational. Generational GCs change the balance significantly by dramatically improving throughput.

sambe · on Oct 29, 2016

In my experience you are still getting single-to-double-digit millisecond pauses on average using CMS (even with some attention/tuning). Do you really think Hotspot can offer a GC where pauses longer than 100us are considered interesting enough to look into?

pcwalton · on Oct 29, 2016

Sure, if they implement barriers for stack references like Go is. But that's a significant throughput hit.

sambe · on Oct 29, 2016

My original intention was to ask if you think that's currently achievable. But also interesting for the future.

zigzigzag · on Oct 30, 2016

Well, probably not.

The question is what use cases can tolerate large throughput hits but not few msec pause times (G1 can also do pauses of a few msec in many cases, I see hundred msec pause times on very large heaps, but not desktop sized heaps).

I suspect there are very few use cases. The G1 team seems to be focusing on scaling to ever larger heaps right now, like hundreds of gigabytes in size. They're relatively uninterested in driving down pause times as Shenandoah is going to provide that for the open source world and Azul already does for the $$$ world.

geodel · on Oct 29, 2016

Unless you have numbers to share for CMS latency in Java I have no reason to assume that they are materially different from G1. I am using CMS for my server applications and multi second latency is quite common STW pauses in CMS.

In general Oracle JDK uses order of magnitude more memory and order magnitudes higher GC latency compare to Go. IMO it is quite useful to remember when deciding on language on new projects. If hotspot people think that these numbers are not true I am sure they would let people know.

the8472 · on Oct 29, 2016

* Heap size and stack sizes × number of threads matters. JVMs can manage hundreds of OS threads with deep stacks and 100GB+ heaps.

* Go's GC is completely non-compacting while G1 is for all generations and CMS for all but the old generation.

* Shenandoah will offer lower pause times by doing concurrent compacting, at the expense of throughput.

* Azul already offers a pauseless compacting collector for hotspot

geodel · on Oct 29, 2016

Here are Shenandoah pause numbers 11]. Max pause is 45ms which I would agree is ultra low pause for Java. Because 100s of ms pause is common for by Java server applications. Here are GC numbers for Go >= 1.6 for 100GB+ [2]. Go 1.7/1.8 have/going to have lower numbers than that.

1. http://www.slideshare.net/RedHatDevelopers/shenandoah-gc-jav...

2. https://talks.golang.org/2016/state-of-go.slide#37

the8472 · on Oct 29, 2016

A big difference is that Shenandoah still does compacting while gogc does not. That also means it can do bump-pointer allocation, i.e. it has a faster allocation path.

geodel · on Oct 29, 2016

There must be some end user (positive) impact of memory compaction in Java. But I do not see in benchmarks where Java programs takes 2-10 times more memory and runs slower than Go.

http://benchmarksgame.alioth.debian.org/u64q/go.html

zigzigzag · on Oct 30, 2016

The end user benefit is stability. A runtime that compacts the heap cannot ever die due to OOM situation caused by heap fragmentation, whereas if you don't compact then the performance of an app can degrade over time and eventually the program may terminate unexpectedly.

the8472 · on Oct 29, 2016

Those benchmarks are of limited use since the JVM has startup costs which get armortized over time. Full performance is only achieved after JIT warmup. AOT would improve that, but on openjdk that's experimental and few people use other JVMs that support AOT out of the box.

About the memory footprint: The runtime is larger, so there's a baseline cost you have to pay for the JITs. But baseline != linear cost.

If you have a long-running applications that run more than a few seconds and crunch more than a few MB of data those numbers would change.

So unless you're using the JVM for one-shot script execution those benchmarks are not representative. And if you do then there would be certain optimizations that one could apply.

> There must be some end user (positive) impact of memory compaction in Java.

Fewer CPU cycles overhead per unit of work, i.e. better throughput, which does seem to be an issue for gogc[0]. No risk of quasi-leaks in the allocator due to fragmentation. Reduced cache misses[1]

[0] https://github.com/golang/go/issues/14161 [1] http://stackoverflow.com/q/31225252/1362755

geodel · on Oct 29, 2016

> About the memory footprint: The runtime is larger, so there's a baseline cost you have to pay for the JITs. But baseline != linear cost.

I do not think so. Here is explanation about Java memory bloat in typically used data structures.

https://www.cs.virginia.edu/kim/publicity/pldi09tutorials/me...

the8472 · on Oct 29, 2016

That is true - although typed arrays will solve that in the future - but the benchmarks you cite (e.g. fannkuch-redux) still run into a memory floor created by the larger runtime.

Also, `regex-dna` will probably benefit from compact strings in java 9.

And some of the longer-running benchmarks are in fact faster on java, so I think that supports my point about warmup.

igouy · on Nov 2, 2016

> (e.g. fannkuch-redux) still run into a memory floor

Yes, apart from reverse-complement, regex-dna, k-nucleotide, binary-trees.

> Also, `regex-dna` will probably benefit from compact strings in java 9.

Do you actually know that com.basistech will re-write their library to use compact strings?

> I think that supports my point

I think that's called cherry picking.

igouy · on Nov 2, 2016

> Those benchmarks are of limited use since the JVM has startup costs which get armortized over time.

-- Benchmarks are of limited use.

-- Are JVM startup costs significant?

http://benchmarksgame.alioth.debian.org/sometimes-people-jus...

funny_falcon · on Oct 31, 2016

binary-trees benchmark is much faster on Java than on Go.

igouy · on Nov 2, 2016

> faster on Java than on Go

go version go1.7 linux/amd64

OP "Sub-millisecond GC pauses in Go 1.8"

matthewwarren · on Oct 29, 2016

Interesting enough .NET doesn't use the "memory access" trick, instead is uses either a direct poll against a variable, or for user code can sometimes just rudely abort the running thread (if it knows it's safe to do).

See http://mattwarren.org/2016/08/08/GC-Pauses-and-Safe-Points/ for all the gory details

pgwhalen · on Oct 29, 2016

Hotspot does that as well. Really nifty stuff.

geodel · on Oct 29, 2016

Hotspot worst case (non) guarantees are 100s of milli seconds not 100s of micro seconds.

pcwalton · on Oct 29, 2016

Again, you're comparing the server G1 GC (tuned for throughput) to Go's GC (tuned toward extreme latency guarantees).

inlined · on Oct 29, 2016

Is this a bad thing? You can just increase the number of front-end servers and have high throughput in a service with reliable tail latency.

heavenlyhash · on Oct 29, 2016

This. It's a lot easier to horizontally scale things with a lean towards consistently lower operational latency. You can keep raking in the benefits and cranking up throughput without a whole lot of thought.

It's much more expensive and complex to take an erratic latency operation and bring it down by throwing on more resources. As far as I can tell, the normal design course is making sure all your major actions are either pure or idempotent allowing parallel (and redundant!) requests to be made... which is a large (worthy, but large) engineering effort, and then we're talking about scaling to 2x or more just so you can make that redundant-request thing your default behavior.

jsiepkes · on Oct 29, 2016

In case of Java there is also a JVM implementation that does pauseless GC (zing): https://www.azul.com/products/zing/pgc/ .

For OpemJDK/Hotspot there will be a <100ms GC in Java 9: http://openjdk.java.net/jeps/189

geodel · on Oct 29, 2016

'Pauseless' is marketing term like those unlimited data plans from telecom/cable providers.

Here is how Zing use flags:

-XX:+UseGenPauselessGC - Generational Pauseless GC (GPGC) optimized for minimal pause times.

johncolanduoni · on Oct 29, 2016

You're right, Zing pause times are "only" uncorrelated with heap size or usage patterns.

avar · on Oct 29, 2016

Another approach you can use in some cases with the JVM, which is often the simplest, is to set up the JVM so it doesn't GC (give it a lot of memory), then either just spawn a new JVM to take over, or take the machine about to run a GC out of your load-balanced pool before running a full GC, then put it back in again.

Doing the manually triggered & staggered GC trick on a pool of machines you control can give you very low latency guarantees, since no production request will ever hit a GC-ing JVM.

kuschku · on Oct 29, 2016

"just"

You can also "just" remove servers that are currently in GC from your pool, and have high throughput in a service with reliable latency.

smarx007 · on Oct 29, 2016

How would you do that?

divan · on Oct 29, 2016

https://arxiv.org/pdf/1504.02578.pdf

mSparks · on Oct 29, 2016

Personally found Java's GC to be a little tricky but generally awesome.

However, 100, even 200ms pauses get "lost" during network coms. But users tend to notice if a page takes 30 seconds to load rather than 1 second (differences i've seen optimising Javas GC, don't know at all how Go and Java compare).

That kind of difference would be very expensive to solve using more servers.

pjmlp · on Oct 29, 2016

The biggest issue is on single deployments like desktop or embedded, hence why these kind of improvements are so relevant.

mSparks · on Oct 29, 2016

That's the key trade of these days in software imho. What kind of memory management you need.

Sever side zero memory leaks absolute requirement and "real time" responses rarely a requirement. triggering gc during init and/or shutdown of a component often enough.

Building a CNC machine - every tick is valuable, but as long as it can run a job for 2 days before crashing no one will notice if it leaks memory like a sieve when you run a calibration routine.

geodel · on Oct 29, 2016

From my knowledge G1 is for low latency by sacrificing some throughput. I do not know why you keep hammering a point what Oracle official documents do not claim.

https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gc...

Here are relevant points:

1. If (a) peak application performance is the first priority and (b) there are no pause time requirements or pauses of 1 second or longer are acceptable, then let the VM select the collector, or select the parallel collector with -XX:+UseParallelGC.

2. If response time is more important than overall throughput and garbage collection pauses must be kept shorter than approximately 1 second, then select the concurrent collector with -XX:+UseConcMarkSweepGC or -XX:+UseG1GC.

johncolanduoni · on Oct 30, 2016

That's relative to other Hotspot collectors. It's still generational with bump-pointer allocation and lighter write barriers than Go, so it is still geared heavily towards throughput relative to Go's GC.

pgwhalen · on Oct 29, 2016

Sorry, I was referring to the implementation of safepoints, not GC latency.

pcwalton · on Oct 29, 2016

To stop the world, Go preempts at the next stack size check [1], which happens during function call preludes. I guess that happens often enough to be fast in practice.

[1]: https://github.com/golang/go/blob/8f81dfe8b47e975b90bb4a2f8d...

rayiner · on Oct 29, 2016

I assume this works so quickly in part because goroutines are backed by a pool of OS threads that don't block? So everybody doesn't get stuck waiting for a thread that's blocked to get around to checking the preemption flag?

pcwalton · on Oct 29, 2016

Right, blocked threads are stopped in the userspace scheduler so there are no synchronization issues with just keeping them stopped.

j1vms · on Oct 29, 2016

> (...) much less do any work in that time

100 microseconds is quite a long time in CPU time for a single-core these days, and proportionally longer with multi-core, or say in GPU time. However taking into account the VM runtime environment, this wouldn't make Go's feat any less impressive.

osi · on Nov 1, 2016

the 100 microseconds is the length of the pause after all threads are stopped.

morecoffee · on Oct 29, 2016

I like faster latency as much as the next person, but these improvements aren't free. On my own servers I notice approximately 20% of CPU time spent in GC.

jemfinch · on Oct 29, 2016

I'm a C++ programmer who hates garbage collection on general principle, but I have to say: 20% of CPU time spent in memory management isn't all that unusual (especially in software that hasn't been optimized), whether or not that memory management is automatic garbage collection or manual.

jblow · on Oct 29, 2016

It's not unusual in completely terrible code.

The difference is if the memory management is manual, you have the ability to clean it up and reduce that overhead toward 0%.

If it's a system-enforced GC, you are limited in what you can do.

pjmlp · on Oct 29, 2016

Quite true, but the reality is that since Modula-3, Oberon, Eiffel and others did not get adopted at large by the industry, many got the idea that it isn't possible to have a GC for productivity and still manually manage memory, if really required.

So now we are stuck with modern languages having to bring those ideas back to life.

kasey_junk · on Oct 29, 2016

In the code I write where it matters the issue is allocation & deallocation time. Thus you don't do those things on the hot path in either gc'd or manual memory management environments.

Given that the overhead become 0 in either.

Is it sometimes harder to write zero alloc code in GC'd languages? Sure but its not impossible.

In Java for instance the difference in performance compared to C++ comes from memory layout ability not memory management.

pjmlp · on Oct 29, 2016

> In Java for instance the difference in performance compared to C++ comes from memory layout ability not memory management.

We need more modern languages with AOT compilation, ability to allocate on the stack and use value types.

I like C++ and have spent quite a few years using it, before settling on memory safe languages for my work.

Many C++ presentations seem to forget on purpose that there are memory safe programming languages, which offer similar capabilities for controlling memory layouts, thus presenting the language as the only capable of doing it.

Modula-3 is an example, or Swift and D for something more actual.

CyberDildonics · on Oct 29, 2016

The reality is that there aren't any straight replacements though since C++ has such mature tools, move semantics take away significant amounts of memory management pain, D without a gc is still exotic, and rust is still in its early stages.

pjmlp · on Oct 29, 2016

It is a matter of which market is relevant for a developer.

There are many use cases where C++ is still used where the use case at hand actually didn't any a real need for it.

For example, on my area of work, C++ has been out of the picture since around 2006. We only use it when Java or .NET need an helping hand, which happens very seldom.

On the mobile OSes for example, C++ is only relevant as possible language to write portable code across platforms, but not so much for those that decide to focus on just one.

There Swift, Java and C# are much better options.

For those in HPC, languages like Chapel and X10 are gaining adepts.

Also as an C++ early adopter (1993) I remember being told by C developers something similar to what you are saying in regards to tool maturity.

Now around 30 years later, their compilers are written in C++.

CyberDildonics · on Oct 29, 2016

I'm not trying to claim C++ is the only language anyone will ever need. I've tried hard to find alternatives but until the D community really goes full force on getting the garbage collection out and starts to care about the tools for the language instead of just the language, it seems like rust will be the only contender (and future C++ and maybe even jai). I wish a language called clay had taken off, it was pretty well put together as a better C.

pjmlp · on Oct 29, 2016

Actually I would rather like they would improve the GC to more modern algorithms, instead of the basic one it uses.

duneroadrunner · on Oct 30, 2016

SaferCPlusPlus[1]: C++. Memory safe. No GC. Quite fast.

[1] shameless plug: https://github.com/duneroadrunner/SaferCPlusPlus

psyc · on Oct 29, 2016

I'll mention a significant case that doesn't have to do with allocation. Large graph-like data structures (lots of small nodes linked by reference) are effectively prohibited entirely by the GC. They make every marking phase take much too long, until whenever time the whole thing gets promoted into the long-lived generation. A mainstream runtime having such an impactful opinion about my data structures (not alloc patterns) is something I just find deeply offensive. Imagine if C++ programs stuttered because you had too many pointers!

They could have avoided that whole problem by providing an API like DontCollectThisRoot, but instead they (and a lot of programmers) chose to pretend the GC was good enough to treat like a black box.

PDoyle · on Oct 29, 2016

Huh? Are you talking about a particular GC? Because every object-oriented program I've ever seen could be described as "Large graph-like data structures (lots of small nodes linked by reference)".

psyc · on Oct 29, 2016

Any GC that walks the graph, and isn't mostly concurrent. You will know when the graph becomes complex enough, because thereafter the GC will not let you ignore it. In my experience, as few as several hundred thousand objects can become a problem. Imagine trying to write a responsive 3D modeling app with vertices, edges, and faces all bidirectionally pointing to each other. You the programmer would think very carefully before fully traversing such a data structure (much of the point of having thorough internal references is avoiding doing much traversal!), and yet the runtime helpfully does it for you, automatically, and there's nothing you can do to stop it.

https://samsaffron.com/archive/2011/10/28/in-managed-code-we...

http://stackoverflow.com/a/14277068

weberc2 · on Oct 29, 2016

FWIW, Go has value types, so there's less referencing than in Java, etc. Also worth noting that these are actually used unlike in C# which has a reference-type-first culture.

KirinDave · on Oct 29, 2016

> If it's a system-enforced GC, you are limited in what you can do.

Perhaps I'm misunderstanding, but do many C programmers understand not only the current state of malloc at any given moment in their code but exactly how it works?

I think not.

A lot of the things you do in C++ to reduce memory management overhead are the same things you can do in Java, C#, and Go to reduce memory management overhead. That effort is neither special nor universal.

HLLs often have to be careful about using language features that unexpectedly create garbage, but in terms of actual management and collection it's not like ANY competent modern language is slow at it.

People often seem to neglect the fact that Java is still surprisingly fast despite spending lots of time on memory management just because many developers are so insensitive to how things alloc. Modern GC systems can manage allocs as well as deallocs, so with care from the programmer and the runtime authors you can reach the kind of performance people talk about as critical for "embedded systems" (even though in practice SO many things do not deliver on this promise in shipped products!).

jblow · on Oct 30, 2016

> Perhaps I'm misunderstanding, but do many C programmers understand not only the current state of malloc at any given moment in their code but exactly how it works?

Good programmers understand how malloc works. What, are you kidding, or am I misunderstanding?

Performance-oriented programmers do not use malloc very much. As you say, you can also try to avoid allocations in GC'd languages. The difference is that in a language like C you are actually in control of what happens. In a language that magically makes memory things happen, you can reduce allocations, but not in a particularly precise way -- you're following heuristics, but how do you know you got everything? Okay, you reduced your GC pause time and frequency, but how do you know GC pauses aren't still going to happen? Doesn't that depend on implementation details that are out of your control?

> even though in practice SO many things do not deliver on this promise in shipped products!

But, "in practice" is the thing that actually matters. Lots and lots of stuff is great according to someone's theory.

KirinDave · on Oct 30, 2016

> The difference is that in a language like C you are actually in control of what happens. In a language that magically makes memory things happen, you can reduce allocations, but not in a particularly precise way -- you're following heuristics, but how do you know you got everything?

First of all, it's not like most mallocs DON'T have heuristics they're following. Without insight into what it wants to do it is equally as opaque to how Java or the CLR manages memory.

And your behavior absolutely can and does influence how much things are allocated, deallocated, and reused. If you think that the JVM cannot be tuned to that level, you're dead wrong and I can point to numerous projects written for virtual machines that reach levels of performance that are genuinely difficult to reach no matter your approach.

> Good programmers understand how malloc works.

"Good programmers know how their GC works. What, are you kidding, or am I misunderstanding?"

> But, "in practice" is the thing that actually matters.

"In practice" Kafka is the gold standard of pushing bits through distributed systems as fast as possible. "In practice" distributed concurrency systems (that are often the speed limit of anything you want to build on more than one computer, e.g., etcd, Zookeeper, Consul) are I/O limited long before their collected nature impacts their performance.

And if we can eventually liberate ourselves from operating systems that give priviledged status to C and C++, that ecosystem will diminish further because its performance benefits come at too high a cost, and are generally oversold anyways.

jblow · on Oct 30, 2016

>> Good programmers understand how malloc works.

> "Good programmers know how their GC works. What, are you kidding, or am I misunderstanding?"

I think you are not understanding what I am saying.

You link your allocators into your code so you know what they are. You see the source code. You know exactly what they do. If you don't like exactly what they do, you change them to something different.

A garbage-collector, in almost all language systems, is a property of the runtime system. Its behavior depends on what particular platform you are running on. Even 'minor' point updates can substantially change the performance-related behavior of your program. Thus you are not really in control.

As for your other examples, apparently you're a web programmer (?) and in my experience it's just not very easy for me to communicate with web people about issues of software quality, responsiveness, etc, because they have completely different standards of what is "acceptable" or "good" (standards that I think are absurdly low, but it is what it is).

koffiezet · on Nov 2, 2016

> You link your allocators into your code so you know what they are. You see the source code. You know exactly what they do. If you don't like exactly what they do, you change them to something different.

In my experience, most C/C++ devs know what malloc/free or new/delete does, but how? They don't care as long as it works and doesn't get in their way. Sure in larger applications, the allocator/deallocator can consume quite some time - but even then it rarely is the bottleneck.

I happen to have a more hands-on experience with allocators, I had to port one a long time ago, but in C or C++, I rarely knew how the one I was using was implemented (except for the one I ported). Seeing the source code? Sorry, that's not always available and even if it is, not too accessible - not that many devs actually ever look into the glibc code...

And linking your allocator? Most of the times you just use the default-one provided by your standard library - so that happens 'automagically' without most developers realizing this. I yet have to see a modern C or C++ app that specifically has to link it's own allocator before it could actually allocate memory. Most compilers take care of this.

For most stuff I do - I like gc's. In most real-world situations, they are rarely the bottleneck, most applications are I/O bound. For most stuff, a GC's benefits outweigh it's disadvantages by a huge margin. And if the gc could be become a bottleneck, you should have been aware of that up front, and maybe avoid something using a GC, although I'm not a fan of premature optimization.

lossolo · on Oct 29, 2016

Embedded systems in most cases have memory constraints and GC languages are memory hogs. The cost of having GC is that you pay with higher memory usage for doing GC in batches thanks to which you do not have to pay for single deallocations. So this performance advantage cannot be used in embedded space because there is no free memory for it, you would need to GC all the time which would kill the performance.

weberc2 · on Oct 29, 2016

> GC languages are memory hogs

This is unrelated to GC.

> The cost of having GC is that you pay with higher memory usage for doing GC in batches thanks to which you do not have to pay for single deallocations. So this performance advantage cannot be used in embedded space because there is no free memory for it, you would need to GC all the time which would kill the performance.

The same is true for C. You don't get to make frequent allocations for free in any language. You have to trade space for performance; in GC-land, the answer is tuning the collector. In Go, this is one knob: the GOGC env var.

KirinDave · on Oct 29, 2016

I've read a lot of really cool papers on GCs that avoid this. The bigger problems arise from languages that take for granted that any abstraction they offer with a memory cost is okay because it offers the user no additional complexity.

For example, languages that use closures have to have very smart compilers or even innocuous functions can create implications for the working set size, which puts the allocator and deallocater under a lot more pressure.

And that's not even the most subtle problem you might run into! A common cause of memory constraints in Java 1.7 and earlier stemmed from subarrays of large arrays. Java made a fairly innocuous decision regarding methods like String.substring that ends up biting a lot of people later on, even as it is the right decision for a slightly different set of performance considerations.

pjmlp · on Oct 29, 2016

Except there are quite a few vendors doing exactly that, selling Java, Oberon and .NET compilers for embedded systems.

A very well know, Aonix, now part of PTC, used to sell real time Java implementations for military scenarios like missile guidance systems.

weberc2 · on Oct 29, 2016

> completely terrible code.

Not everyone is writing games...

PDoyle · on Oct 29, 2016

Here we go again.

https://news.ycombinator.com/item?id=11422064

foota · on Oct 29, 2016

It drastically changed my perspective on gc when I realized that it's really not much slower, it just batches the slowness.

btmorex · on Oct 29, 2016

That's true for GC vs. manual heap-based memory management, but most GC languages don't do stack allocation at all or only for primitive types and stack allocation is much, much faster than any sort of heap-based memory management.

pjmlp · on Oct 29, 2016

All these GC languages (RC is GC, just in case) do allow stack allocations for any user defined type.

Mesa/Cedar, Modula-2+, Modula-3, Oberon, Oberon-2, Active Oberon, Component Pascal, Eiffel, D, Swift

Dylan16807 · on Oct 29, 2016

Listing largely-obscure languages and repeating them multiple times doesn't do much against a claim of "most".

weberc2 · on Oct 29, 2016

Java, Go, and C# all do stack allocation either implicitly via escape analysis (Java), explicitly via value types (C#), or both (Go). I don't know that this is "most" (perhaps by marketshare), but these are certainly 3 of the most popular languages in this space.

Dylan16807 · on Oct 29, 2016

That answer is fair, and it sounds like btmorex is completely wrong. Thank you.

pjmlp · on Oct 29, 2016

Well, the Algol derived ones, might be a bit obscure for those that haven't learned history of computing, but they are quite well known.

I can list other ones that are actually quite obscure.

In any case, I also had D and Swift on the list, which are quite actual.

As for the rest, I don't have anything to add to weberc2's answer.

CyberDildonics · on Oct 29, 2016

The unfortunate part is finding good IDEs with solid debuggers and auto-completion.

weberc2 · on Oct 29, 2016

Go has excellent auto-completion support, even for vim. It's debugger (delve) is also decent, though not graphical. There are not many languages with better tooling than Go, in my experience.

natefinch · on Oct 30, 2016

FYI, delve is integrated into a lot of editors, and basically works exactly like visual studio when used in VS Code (I used visual studio for C++ & C# for 13 years before moving to Go).

pjmlp · on Oct 29, 2016

Yes, but that is not relevant for those that live on VI and Emacs.

PDoyle · on Oct 29, 2016

"stack allocation is much, much faster than any sort of heap-based memory management"

No, it's not. For short-lived objects, at least on the JVM, allocation is a pointer bump, and collection is free, because unreachable objects are not even walked. Stack allocation doesn't beat that by much.

kasey_junk · on Oct 29, 2016

> stack allocation is much, much faster than any sort of heap-based memory management.

Is that true? I thought nursery allocation was usually as fast if not faster than stack allocations.

johncolanduoni · on Oct 29, 2016

The allocation speed will be close; a bit worse since the pointer to the end of the current allocation buffer usually isn't in a register and a function call and range check is required. However the overall cost of handling that memory from start to finish is significantly higher than the stack even if it gets clobbered in the first nursery collection.

Not because it is very high, but because stack allocation/deallocation is so very simple.

klodolph · on Oct 29, 2016

Range checks can be done by the MMU, they're basically free.

foota · on Oct 29, 2016

That's fair. Does Go do this? Or any other somewhat mainstream language? Any thoughts on how arenas (rust) compare to gc and manual allocation for speed?

buzzybee · on Oct 29, 2016

Arenas trade some granularity and flexibility for speed and fragmentation-free allocation; they're a great choice for bulk data that you can precompute the size of and want to iterate over very quickly, and they're also easy to reason about. You can do many tasks using only arenas and stack allocations, and it'll zip along very quickly. If needed, you can flag liveness to get very efficient reuse of short-lived objects. They're less ideal if you are gradually adding more data over time, and you want to retain a consistently low usage, since you end up with "stairsteps" of latency when it copies into a bigger buffer, and once you have the big buffer, it's quite complicated to shrink it down again.

malloc()-style allocation gives you precision over how much you want, and this is most interesting to a memory-constrained system trying to allocate amongst many different processes(the original Unix use-case). But willy-nilly use of malloc() and free() leaves you with lots of fragmentation, as well as a larger attack surface for memory errors. What the actual allocation algorithm does is out of your hands, too, at least if you're depending on the OS allocator(you can always go write your own and supply it with a giant heap to munch on, and this may occur when you need tuning for individual allocation scenarios).

In the right scenario, a GC won't do too much differently from a manual allocator(there are lots of optimizations that could bring the allocation to same-or-negligible runtime), but as we all know, right scenarios are something you can't always count on. A GC does, however, greatly simplify the memory management of a long-lived process since it can do nifty things like automatically compact the heap.

IME, a mix of stack, GC, and some arenas in the form of growable arrays, is absolutely fine for the needs of most applications. Quite often this last requirement creates a sticking point, though, where the language disallows value semantics for arrays of objects, and then you can no longer assume they're allocated linearly, plus the GC is taxed with additional references to trace. In those cases, if I have them I use arrays of primitive values as crude containers for an equivalent optimization. Either way, it's very annoying and creates some awful code, because those huge batches that I'd like to put in arenas also tend to be the bottleneck for the GC.

kasey_junk · on Oct 29, 2016

C# & Go both have stack allocation options. The JVM is supposedly getting them sometime soon.

weberc2 · on Oct 29, 2016

I'm of the impression that Java has done escape analysis for a while now. They just haven't had value types, which as I understand, just introduce a semantic for stack allocation.

pjmlp · on Oct 29, 2016

Actually I have seen presentations that mentioned Graal is much better at it than Hotspot.

CyberDildonics · on Oct 29, 2016

stack allocation: 20 years in the making.

yxhuvud · on Oct 29, 2016

Malloc is O(n²) so it totally depend on how able you are to not do gradual allocation.

faragon · on Oct 29, 2016

In most C libraries malloc() is O(log n), e.g. when implemented as balanced trees.

yxhuvud · on Oct 30, 2016

Argh. True, sorry for the brainfart.

pjmlp · on Oct 29, 2016

Check my other comment

https://news.ycombinator.com/item?id=12823102

sz4kerto · on Oct 29, 2016

In Java, heap allocation is a single instruction, most of the time.

foota · on Oct 29, 2016

Do you mean a single jvm instruction? Or?

zigzigzag · on Oct 30, 2016

No, he means a single CPU instruction. That's not quite fair, I don't think it actually is a single instruction, more like a few instructions in the best case and a very large number of instructions on the slow path.

johncolanduoni · on Oct 29, 2016

The tradeoff here seems to be a more complicated write barrier, so the loss in performance here will for the most part not show up as time spent in the GC. I'm curious to see how big of an issue this will be; the only GC I've heard of with such a heavy write barrier is OCaml's, which is ameliorated by the language's disposition towards immutability.

pcwalton · on Oct 29, 2016

And OCaml has a generational GC, unlike Go. So Go's throughput is going to be hit even harder.

Not going with a generational GC is a choice that runs against what has been accepted practice for decades. It may of course work out, but it's what I'd consider an experiment.

_0w8t · on Oct 29, 2016

Typical Go code does not generate a lot of short-lived objects, compared with, say, Java or with typical usage of persistent data structures in functional languages. That removes the practical need for generational GC.

pcwalton · on Oct 29, 2016

If this were true, then escape analysis wouldn't be so important in Go code. But it is: the lack of it is the reason why gccgo is slow in practice.

_0w8t · on Oct 30, 2016

I see the importance of escspe analysis as another indication that tyical Go code does not generate a lot of short-lived objects on the heap. It is just the language does not allow to express particular stack-allocation idioms requiring the compiler to infer them.

Compare that with Java where rather sofisticated escape analysis does not help much besides allowing to return several things cheaply from a function. Typical code there just does not follow stack-like allocation patterns.

weberc2 · on Oct 29, 2016

I'm also of the impression that Go's "transactional GC" is similar to a generational GC?

pcwalton · on Oct 29, 2016

Sort of, but it sacrifices the main benefit of generational GC by not allowing for bump allocation in the nursery.

_0w8t · on Oct 30, 2016

I thought it was compacting/moving GC, not generational one, that allowed bump-allocation.

sudhirj · on Oct 29, 2016

That might imply that most allocations are happening on the heap - if your code is structured to make allocations only on the stack as much as possible, there wouldn't be that much work to do.

psyc · on Oct 29, 2016

If someone could do this for Unity, it would change my life.

forrestthewoods · on Oct 29, 2016

I've never been more careful to avoid any allocation as I have been in Unity. I had fewer memory concerns in C++ for crying out loud.

psyc · on Oct 29, 2016

I would 10,000x rather have to match new with delete (big deal) than to maintain the revolting unidiomatic contortions I'm obligated to do to outwit the GC.

CyberDildonics · on Oct 29, 2016

In modern C++ you don't even have to do that.

ant6n · on Oct 29, 2016

Well, in modern C++ you _shouldn't_ do that.

psyc · on Oct 29, 2016

Great point. I may actually have to re-evaluate Unreal 4.

forrestthewoods · on Oct 29, 2016

Agreed x10000. For VR we must maintain strictly zero garbage generation. Which is really damn hard in a language built on the assumption of GC.

JoeAltmaier · on Oct 29, 2016

This. Times 100,000

ChrisClark · on Oct 29, 2016

Seriously! Once they've fully upgraded .NET and the GC, we'll all sigh in relief.

pcwalton · on Oct 29, 2016

.NET already has concurrent garbage collection: https://msdn.microsoft.com/library/ee787088(v=vs.110).aspx#b...

See the "SustainedLowLatency" mode for something very similar to what Go does (although .NET's GC, unlike Go's GC, is generational, which is a significant difference).

pjmlp · on Oct 29, 2016

The problem is that Unity doesn't use a modern .NET runtime at all, rather the frozen Mono version from back when the Mono developers where still working for SuSE.

squeaky-clean · on Oct 29, 2016

Unfortunately, Unity still uses .NET 2, and their next planned upgraded (to 4.6) is still listed as an undetermined eta. That's good to know for the future though.

ijk · on Oct 29, 2016

They're beta-testing .NET 4.6 in Unity 5.5: https://forum.unity3d.com/threads/upgraded-mono-net-in-edito...

matthewwarren · on Oct 29, 2016

.NET also has the TryStartNoGCRegion API, which allows you too prevent all GC collections (for a certain while).

See http://mattwarren.org/2016/08/16/Preventing-dotNET-Garbage-C... for the full details

weberc2 · on Oct 29, 2016

You can do this in Go as well: https://golang.org/pkg/runtime/debug/#SetGCPercent

kbenson · on Oct 29, 2016

Does unity actually use the .NET runtime, or mono? I was under the impression it used mono. Does that apply for mono?

johncolanduoni · on Oct 29, 2016

Depends on the platform you are targeting. Mono's GC (last I heard, they may have integrated .NET Core's by now) is relatively primitive. It was mostly developed just to have something better than Boehm.

Their experience was actually pretty instructive; with their first release of the new GC (a proficient but basic generational collector) they still weren't beating Boehm all the time, and usually weren't beating it by much. Given its constraints, Boehm's GC is impressively optimized when you run it in semi-precise mode.

marktangotango · on Oct 30, 2016

Very interesting, I'm interested in how you came to this understanding. How did acquire information about this? Did they blog about it, or do you happen to follow their mailing lists, or ... ?

johncolanduoni · on Oct 30, 2016

I read a blog post from someone at Xamarin with some test cases graphed, a bit before Xamarin started suggesting SGEN as the default. I'll see if I can find the one I'm thinking of.

pcwalton · on Oct 29, 2016

Were they bump allocating in the nursery? That's where you tend to see the biggest gains from generational GC.

johncolanduoni · on Oct 29, 2016

Yes, they even used TLABs. I believe the issue was that Xamarin was more interested in mobile performance at the time when SGEN was seeing heavy development and preparing for release, so they optimized for memory usage instead of throughput. The generational part was probably more of a hindrance than a benefit at that point in the process.

psyc · on Oct 29, 2016

For anyone else who's in this hell, this is the best reference I know of about how you have to write C# when the GC hates your data:

http://stackoverflow.com/a/14277068

tl;dr - "arrays of structs are really the only bulk storage method available in C#"

yelnatz · on Oct 29, 2016

Does anyone have any write ups for GC algorithm comparisons?

E.g. Java vs Go vs C# vs Actionscript vs Javascript?

mangeletti · on Oct 29, 2016

Does this mean Go could be more suitable for robotics and drones now?

TheCrott · on Nov 2, 2016

For IoT there's a package for that

bishop_mandible · on Oct 29, 2016

Wow, makes me want to develop a AAA game in Go.

Narishma · on Oct 29, 2016

What makes a game AAA is the resources (human and financial) that were used making and marketing it, not the language used to write it.

pjmlp · on Oct 29, 2016

And the culture.

AAA game developers only moved from Assembly to C, Pascal when forced to do it.

A few years after they moved from C to C++ when the SDKs started to be C++ only.

Similarly they will only move from C++ when forced to do so, and they only major complaint is related to build times.

I bet most are willing to wait until C++20 for modules than switching to another language, even if some studios manage to have some hits developed in safer languages.

zubat · on Oct 29, 2016

Very good point here The outlook of gamedevs to new tooling tends to be pessimistic because so little is actually geared towards what they work on, or want to work on.

On the other hand, Web and mobile games have lived with the consequences of a managed runtime for many years now. There are limits in how much processing power is available there, but it ultimately just diverts developer attention towards other things like a more robust backend, faster turnarounds, and other general workflow improvements independent of scene fidelity.

chillydawg · on Oct 29, 2016

I get why you're saying this, but go-to-c bindings are not fun and that's how you'd have to talk to the hardware.

pjmlp · on Oct 29, 2016

Only for the one implementing it.

Also on some platforms it could even be Go-to-hardware.

koffiezet · on Nov 2, 2016

You really want a higher-level API in most cases, and these are almost exclusively written in C/C++. Also in Go, there's a certain overhead when calling C functions.

jgalt212 · on Oct 29, 2016

Can one write an HFT application in Go now?

kasey_junk · on Oct 29, 2016

You might be able to, but it won't be because of GC times.

In HFT systems in GC'd languages or not, you don't allocate on the hot path so GC times are immaterial.

cheepin · on Oct 29, 2016

If you are measuring response in nanoseconds, 100 microseconds is still a lot.

However, it may be good enough for games at well below 1% of your time budget for a 60 fps game assuming cache locality is good enough so you don't waste too much time fetching from main memory.

Matthias247 · on Oct 29, 2016

Besides games it would be interesting how good Go would now work for things like low-latency audio processing (single-digit-millisecond-latency). That's some kind of classic domain where performance is not a problem but once you miss the target timeframe you are pretty fucked up (producing and hearing glitches).

chillydawg · on Oct 29, 2016

I write a fair amount of low latency code in go, but none of it is hard real time. Average throughput of a few microseconds (for my simple workloads) with spikes of milliseconds here and there is what I tend to find. Fine for a lot of things but I'd be hesitant to use it for high fi audio apps that are very sensitive to latency. Humans may only be able to notice 50ms, but if you're chaining DSP you can end up with a fair bit of variance in your processing pipeline.

pjmlp · on Oct 29, 2016

Here is a demo from IBM doing audio processing with their real time GC in Java.

http://researcher.watson.ibm.com/researcher/view_group.php?i...

Already in 2012, so even if not yet there, I think Go can get there, specially since you can take care to just stack allocate and minimise GC use during the hot tub path.

bitmapbrother · on Oct 29, 2016

.1 of a millisecond is very impressive. Does this mean Go can finally be considered a systems programming language?

pjmlp · on Oct 29, 2016

Go can become a systems programming language, even if the GC haters don't think so.

It has the same features as Oberon for systems, programming which was used to build quite a few Workstation OS at Swiss Federal Institute of Technology in Zurich (ETHZ), by Niklaus Wirth.

https://en.wikipedia.org/wiki/Oberon_(operating_system)

You can read the source code here in the 2013 revised edition of the 1992 book.

http://people.inf.ethz.ch/wirth/ProjectOberon/index.html

The only thing missing from Go versus what Oberon offered is register access on the unsafe package, but even then can be sorted out with an extension or a few calls into Assembly.

Oberon-07, which is even more minimalist that either Go or the original Oberon is sold by Astrobe for bare metal programming on cortex M boards.

http://www.astrobe.com/default.htm

Go is already bootstraped into itself, so the writing compilers is taken care of.

It is used by Docker and Kubernetes for managing containers.

It just needs someone writing an OS with Go for its Master or PhD thesis, or even just port Oberon source code, which is freely available into Go.

http://wiki.osdev.org/Go_Bare_Bones

DSMan195276 · on Oct 29, 2016

Personally, I would say latency isn't the problem, and that not being able to avoid the gc makes Go automatically not a systems programming language. Not all situations can afford a gc, and really for a lot of systems programming usage (Particularly embedded systems) using a gc just obscures memory usage when you in a lot of cases the memory can be declared statically to begin with (Ensuring a maximum amount of memory usage).

That's not to say you couldn't potentially write an OS or an embedded system in Go (I mean, you can write OS's in Lisp if you really want) but I doubt it would be fun and I doubt anybody would recommend it. You definitely won't be writing idiomatic Go without a lot of extra pieces that you can't really afford in those situations.

pjmlp · on Oct 29, 2016

Using Oberon, Java and .NET is surely more fun than plain C and there are a few embedded companies selling such stacks for their boards.

johncolanduoni · on Oct 29, 2016

Most other sophisticated GCs (e.g. .NET and Java's) can obtain pause times in a similar range for generation 0 collections. So if GC is the reason you don't want to use one of those you'd probably be more interested in improvements in worst case pause times. Go is however very good at avoiding unnecessary object allocation and doesn't need a JIT so it may still be closer to what you need than those languages.

pcwalton · on Oct 29, 2016

Right, and tenured generation collections are already concurrent in HotSpot and .NET. Generally only heap compaction needs to stop the world; you can disable compaction if this is a problem for you.

_ak · on Oct 29, 2016

What the Go developers consider a "systems programming language" was exactly explained in the very first announcement/presentation of Go. They clearly outlined that it's for building systems like Google's, not operating systems.

aikah · on Oct 29, 2016

> .1 of a millisecond is very impressive. Does this mean Go can finally be considered a systems programming language?

Can you write an OS kernel in Go ? no Go's runtime still depends on a OS. And whoever talked about Go as a system language didn't have kernels in mind, but "network infrastructure".

pjmlp · on Oct 29, 2016

Of course you can, time to learn about Mesa/Cedar, Modula-2+, Modula-3, Oberon, Sing#, System C# and possibly a few others.

Here you can learn how those guys at Swiss Federal Institute of Technology in Zurich did it.

http://people.inf.ethz.ch/wirth/ProjectOberon/index.html

Basically the runtime is done bare metal thus becomes the OS kernel.

Oh, and Oberon wasn't just a research OS, it was actually used across the computing department by many of its employees.

Someone · on Oct 29, 2016

Go needs a runtime, but it need not be a full OS; it could be a not too big library that gets linked to a kernel written in go.

Similarly, can you write an OS kernel in ISO C? No, you still need some assembly or runtime support. For example, ISO C doesn't have any notion of making syscalls or returning from the kernel to a caller or for setting up page tables.

Any argument why go isn't suitable for systems programming along these lines should be about how much and what kind of runtime is acceptable for a systems programming language.

A (fairly popular, I think, but certainly not universally agreed upon) argument could be that systems programming languages cannot have garbage collection because it should be possible to implement a garbage collector in them, and doing that in a language that already has one is possible but silly.

4ad · on Oct 29, 2016

Go's garbage collector is implemented in Go.

Someone · on Oct 29, 2016

I can't find it in the documentation, but I would think that must be implemented in ProtoGo, where ProtoGo is a Go-like language that doesn't use Go's garbage collector or a library that does (Enforcing not using anything that uses the garbage collector may be left to the programmer)

That is necessary even with a concurrent garbage collector because a garbage collector that allocates in its own heap may hang itself (propaganda allocates; gc triggered; gc allocates; gc triggered to satisfy the allocation; new gc triggered; etc.) . Or do Go's developers accept this risk and live with it?

4ad · on Oct 29, 2016

> Enforcing not using anything that uses the garbage collector may be left to the programmer

It's left to the compiler actually. Programmers can't be trusted.

The runtime does not implicitly generate garbage (like arbitrary Go code). It is compiled with a mode that fails compilation if something can't be stack-allocated. When heap allocation is necessary, it is requested manually. However, the memory returned is garbage collected, as usual, there is no free.

pjmlp · on Oct 30, 2016

Besides 4ad reply, here you can see how Oberon has its GC implemented in Oberon, as another example of a bootstrapped GC enabled systems programming language.

http://people.inf.ethz.ch/wirth/ProjectOberon/Sources/Kernel...

4ad · on Oct 29, 2016

You can absolutely write a kernel in Go, many year ago Go used to ship with a bare metal runtime, as a demonstration...

bishop_mandible · on Oct 29, 2016

Since the term is not defined anything can be considered a systems programming language.

_ak · on Oct 29, 2016

The Go developers provided their definition, which is something people conveniently ignore whenever they try to be clever about Go not being a systems programming language.

jjnoakes · on Oct 29, 2016

To be fair, if I repurpose a term that is widely used in one way to mean something else, the confusion that ensues is kind of my fault.

geodel · on Oct 29, 2016

So many places I worked had systems engineering department and none of them had anything to do with operating systems/ device drivers. I wonder whether it has to do with OS hackers hanging on internet together and deciding what systems would mean.

_ak · on Oct 30, 2016

There was no repurposing, as the context was exactly explained. Only later, the detractors decontextualized the Go team's original statements.

the8472 · on Oct 29, 2016

The lack of control over threading makes it problematic as systems programming language.

bitmapbrother · on Oct 29, 2016

Brief synopsis:

Go 1.8 is moving to a hybrid write barrier in order to eliminate stack rescanning. The hybrid write barrier is equivalent to the "double write barrier" used in the adaptation of Metronome used in the IBM real-time Java implementation.

Some benchmarks:

https://github.com/golang/go/commit/bd640c882a8dcb88a0497770...

bluejekyll · on Oct 29, 2016

[flagged]

bbcbasic · on Oct 29, 2016

[flagged]

sctb · on Oct 29, 2016

Like we've asked before, please don't do this here.