Performance patches in Go 1.11

obl · on June 22, 2018

Cool stuff. It's nice to have a somewhat mainstream language using its own backend instead of llvm. You can in fact get pretty far by tailoring your compiler to your specific needs. You end up with much less code, it's also easier to make it fast.

You do cut yourself of from the more advanced optimizations simply for man-hour reasons. I'm guessing it's a trade off they are willing to make.

I have to say that I really dislike the kind of large pattern matching they added for the hash table clear. C compilers do that as well for e.g. memset/copy or bit rotate instructions.

IME it's really brittle and would be more beneficial as some kind of a lint ("you could replace this bunch of code by a call to clear()").

It makes the code simpler and won't inexplicably slow down next time someone does a tautological refactoring.

bcaa7f3a8bbc · on June 22, 2018

Note: Go has a GCC implementation which is actively maintained.

I've seen some computational-intensive code received a ~10% performance boost thanks to the heavy wizardry of code generation in GCC in past years (but, see also: masklinn's comments). Once gccgo implements Go 1.11, it may be a good idea to run the benchmark and get some new numbers.

mseepgood · on June 22, 2018

> Note: Go has a GCC implementation which is actively maintained.

They are also actively working on a third, llvm based implementation: https://go.googlesource.com/gollvm/

dilap · on June 22, 2018

Go actually doesn't have any other way to clear a map!

stcredzero · on June 22, 2018

Go actually doesn't have any other way to clear a map!

Which, actually, is brilliant design! Now, I can find all map clearings with one operation. (I need to implement that syntactic search/rewrite tool I miss from Smalltalk.)

apta · on June 23, 2018

How is not having map.clear() a brilliant design?

stcredzero · on June 25, 2018

In a large project, you'll have all ways of doing any given thing represented in a code base. Reducing that number down to 1 means that searching for all occurrences is reduced to 1 operation. If you are working on a large project, searching for usages/senders/occurrences is one of the most common tasks that you do. In fact, using escape analysis of the same kind used for "Extract Method" refactorings, you can even catch clearing interspersed with other code in the for-range loop.

A 50%+ reduction in programmer work is nothing to be sneezed at.

tedeh · on June 22, 2018

... but they made one (runtime only) for that particular use case if I understood the slides correctly, and that is the optimization.

dilap · on June 22, 2018

yep, that's right. but it's not accessible to user code, so this isn't something that could be handled by a linter (as suggested by GP).

i'm actually happy with the situation. there is literally only one way to clear a map, and it is fast.

bitwize · on June 22, 2018

No. They should have used LLVM.

Go is a crippled language because it lacks generics and generalized higher-order types. The stated reasons for this are to make the compiler easier to implement. Maybe they wouldn't have had such implementation difficulties if they had just gone with the compiler backend the industry has standardized on, freeing up developer resources to make Go a language suitable for big development tasks and competitive with C++, C#, etc.

cdoxsey · on June 22, 2018

Generics are a design problem not an implementation problem.

If folks agreed on a design finding resources to get them added wouldn't be hard.

Not sure what you're trying to imply with your last sentence, but Go is used by many companies large and small for big development tasks. I work for a company where most of the backend systems are written in Go.

ovao · on June 22, 2018

That's not an accurate summary of the reasoning. It's not just the complexity it adds to the compiler, but to the type system in general, and to the complexity it may bring to user code.

The Go team is not against adding generics, and there have been proposals for them. They have simply not landed on a proposal that they feel is a strong fit for the language.

morrbo · on June 22, 2018

These are some really cool optimizations. It is nice that they're letting you put these in, and it's always kind of bugged me that they won't do some basic optimization of their util classes - like ReadDir sorting by filenames by default, removing one line of code can speed it up 800% on large directories iirc, but they won't honour the pull requests. The argument being "it's a util class, if you don't like it, don't use it.'. Whilst this is true, IMO the stdlibrary/common functions should be as quick as possible, with anything extra bolted on (within reason - sorting names being an obvious one). I wouldn't expect to have to read the actual source code in order to try and optimize, but i guess i'm wrong here ¯\_(ツ)_/¯

grey-area · on June 22, 2018

They take the Go 1 promise quite seriously and do try not to change existing behaviour, which is a good thing. I'm quite happy about that, because otherwise we'd see far more breaking changes, and it'd be painful to develop large apps on top of the stdlib. Perhaps in Go 2 they'd be open to changing it or moving ioutil out of the stdlib, but I'm quite sure someone somewhere depends on the ordering, and you're complaining about a 10 line function, you could just copy it out and remove the offending line:

    func ReadDir(dirname string) ([]os.FileInfo, error) {
	f, err := os.Open(dirname)
	if err != nil {
		return nil, err
	}
	list, err := f.Readdir(-1)
	f.Close()
	if err != nil {
		return nil, err
	}
	return list, nil
    }

I much prefer a language and stdlib that stays stable over years, even at the cost of some nice new features/improvements - the place for code to move fast and break things (if you like that sort of thing) is much higher up the stack.

morrbo · on June 22, 2018

That's exactly what I did. The problem I was trying to address is you shouldn't have to expect to read into the stdlib to optimize the code. Adding a ReadDirNoSort (as suggested in the initial 2012 pull) wouldn't hurt anyone.

Scaevolus · on June 22, 2018

There are more optimizations than that that you could probably use if you want fast ReadDir.

Golang's ReadDir reads directory entries, stats all files, and sorts the list. If all you want is the directory entries directly, try godirwalk (also much faster for walking!): https://godoc.org/github.com/karrick/godirwalk#ReadDirents

grey-area · on June 22, 2018

Well it would expand the library surface, so now they have yet more code to maintain and they have to maintain a function with confusing and inconsistent naming forever. I think they made the right call here personally and the simplicity and stability is part of the appeal of the language.

huntertwo · on June 22, 2018

They probably don't want to break any code that relies on that sorted output. Don't know why they would've decided on that to begin with though.

kodablah · on June 22, 2018

The ModInverse and GCD improvements will be very welcome in a demo I just wrote for trustless card playing[0].

I'm beginning to wonder if there will become more value in "lifting" binaries to LLVM IR, applying all opts to them, and re-compiling them. McSema does this[1] albeit you have to setup IDA and what not. Has anyone applied this to Go programs and measured its benefits for their specific use cases?

0 - https://github.com/cretz/go-mental-poker 1 - https://github.com/trailofbits/mcsema

shawabawa3 · on June 22, 2018

They mention VictoriaMetrics, a time series metric database - is it open source? I couldn't find any reference to it on google

ckocagil · on June 22, 2018

It even claims it's the fastest time series db. Is it faster than kdb+? I'm skeptical.

valyala · on June 23, 2018

VicrtoriaMetrics is in the development at the moment and isn't published yet.

rebelshrug · on June 22, 2018

interested in this, too

MaxBarraclough · on June 22, 2018

Looking at Slide 7: doesn't LLVM do function-inlining, presumably with quite well-tuned heuristics? What's the advantage in having the frontend side of the compiler doing it too?

Or am I wrong, and does LLVM not do inlining?

Edit: I'm a silly person, Go doesn't use LLVM. I've been wrong about that for some time!

masklinn · on June 22, 2018

LLVM does do inlining (it's the most important optimisation you can implement, at least for native codegen), but the reference Go implementation (gc) does not use LLVM, the entire toolchain is custom.

On the one hand, that allows very fast compilation & codegen, on the other hand the optimisations are fairly limited. It also doesn't benefit from improvements done by other users, but doesn't saddle you with their issues[0].

There's also a Go implementation on top of GCC (gccgo), however while it generates high-quality code historically it wasn't really better than gc due to lacking escape analysis (so it did a ton more heap allocations than the reference) and having a lackluster GC.

[0] Rust has had its own LLVM fork for as long as the project has existed, for both fixing issues it hit which haven't yet been fixed upstream and implementing features it needs which have not been or can not be upstreamed

m_sahaf · on June 22, 2018

To add to your comment, here are Russ Cox's comments on why the toolchain is custom and they didn't use LLVM:

- (2010) https://news.ycombinator.com/item?id=1509700

- (2014) https://news.ycombinator.com/item?id=8817990

pjmlp · on June 22, 2018

I am always torn in such approaches, in one way it means throwing away decades of code optimization.

On the other hand, in a world that devs only believe in stuff when others actually implement it in production, it is nice to see that their toolchain is self sufficient without the usual C or C++ dependency.

Many language designers opt for implementing their toolchains in C or C++ just for mere convinience, and it is ok so.

However it creates the wrong perception among those without compiler development knowledge, that C and C++ are the only way to implement a toolchain.

masklinn · on June 22, 2018

Seems to me these are orthogonal concerns, LLVM is in C++ but the question I answered was about using LLVM versus implementing a custom toolchain, not implementing a custom toolchain in C++ versus some other language.

pjmlp · on June 22, 2018

Agreed, maybe I expressed myself badly, but that was still kind of my point.

MaxBarraclough · on June 22, 2018

Informative comment, thanks.

I was so sure it used LLVM!

pjmlp · on June 22, 2018

Go doesn't use LLVM, so that is kind of moot point.

In more sophisticated languages, you might get inlining done multiple times, as some inlining decisions are easier with code knowledge, while others are better when the optimizer only sees IR.

oblio · on June 22, 2018

Go doesn't use LLVM as far as I know.

agnivade · on June 22, 2018

The newly introduced prove pass is probably the biggest game changer. And it provides a ground work for new optimizations to be done on top of it.

giovannibajo1 · on June 22, 2018

Prove was already there, what we did is to improve it to eliminate even more bound checks.

PS: I'm the author of those prove patches.

hectormalot · on June 22, 2018

This is what I like about HN :)

How did you prioritize which BCE’s to include? (Or were these all that had a ticket?)

giovannibajo1 · on June 22, 2018

I wrote a Nintendo DS emulator in Go (see my GitHub) and ended up seeing lots of missing optimizations in generated code. I opened a few tickets and then focused on BCE. I realized that one of the main problems was the missing transitivity of the pass, so I ended up writing a largish piece of code to implement a partial-order set (see the poset data structure) and then picking on a few low hanging fruits that it opened up to be picked (searching for open tickets on GitHub issues). These were my first contributions to the SSA passes btw.

agnivade · on June 22, 2018

oops, my bad. I assumed with those big checkins that this is the first time this is getting introduced.

bitmapbrother · on June 22, 2018

This was my favorite part:

>This is fun and is easier than optimizing gcc / llvm, since the majority of the Go compiler, runtime and standard library is written in Go

benhoyt · on June 22, 2018

That's great! I'm curious how much these optimizations slow down the compiler, if any? I remember something about Go only adding optimizations if the amount they increased the speed of the binary meant that compilation speed was sped up too (but don't have a source for that right now, and doubt it's generally true).

ax00x · on June 22, 2018

Is there a recording of this presentation by any chance?

valyala · on June 22, 2018

Unfortunately no. The presentation was shown on the Kyiv Go Meetup - https://www.meetup.com/uagolang/events/251712321/ and it was in Russian. I didn't notice any recorders during the presentation.

ax00x · on June 22, 2018

Too bad, thank you regardless :)

georgewfraser · on June 22, 2018

Is there a good guide to the go compiler anywhere, that outlines all the stages from source code to assembler to binary?

valyala · on June 23, 2018

See https://github.com/golang/go/tree/master/src/cmd/compile and https://github.com/golang/go/blob/master/src/cmd/compile/int...

isaachier · on June 22, 2018

IIRC the mapaccess improvement had to be rolled back. There was a bug in it. Not sure if it will make this release.

valyala · on June 23, 2018

Not yet. There is a fix for the bug - https://go-review.googlesource.com/c/go/+/120255 .

tapirl · on June 22, 2018

this? https://github.com/golang/go/issues/25936

cup-of-tea · on June 22, 2018

Is this really the best format to post this in?

MaxBarraclough · on June 22, 2018

Imperfect, but not so bad.

If the slides are available today, and are viewable in a web-browser, I don't mind too much. Presumably it's either this or nothing.

If there's already a finished blog post in conventional format, then sure, no sense linking to a slideshow.

masklinn · on June 22, 2018

> Imperfect, but not so bad.

It's pretty bad. More so because Google Slide has a PDF export, but despite every browser having a built-in PDF viewer they force the download.

MaxBarraclough · on June 22, 2018

But what's the problem?

It's JavaScript-heavy and all that, but it works fine, no?

masklinn · on June 22, 2018

That slides are very content-sparse. Being able to display the decks as a single continuous document and with several slides per screen makes what little information it contains much easier to go through (and back to if necessary).

oblio · on June 22, 2018

The text is jumbled up on Firefox/Windows :(