Saving a Third of Our Memory by Re-Ordering Go Struct Fields (2020)

ferdowsi · on Jan 10, 2022

Why Go doesn't do this by default: https://github.com/golang/go/issues/10014#issuecomment-91436...

> It seems fine to me for the spec not to guarantee anything about struct field order in memory. The spec doesn't operate at that level.

> That said, no Go compiler should probably ever reorder struct fields. That seems like it is trying to solve a 1970s problem, namely packing structs to use as little space as possible. The 2010s problem is to put related fields near each other to reduce cache misses, and (unlike the 1970s problem) there is no obvious way for the compiler to pick an optimal solution. A compiler that takes that control away from the programmer is going to be that much less useful, and people will find better compilers.

scottlamb · on Jan 10, 2022

How often is "reduce cache misses" that different from "use as little space as possible"? They're basically the same if the struct can be no more than one cacheline wide. When the struct is larger, it's possible you're accessing certain sets of fields together often enough for this to be a useful consideration, but I have no intuition on how common it is. Although it occurs to me that when this happens, switching from arrays of structs to structs of arrays may be a better optimization anyway.

fwiw, Rust leaves the ordering unspecified (unless you specify it via #repr[(C)]). Currently it orders to minimize padding. (In theory a future compiler could reorder to minimize cache misses based on a profile or something.) According to https://doc.rust-lang.org/nomicon/repr-rust.html part of the rationale for reordering was generics. If you have a struct Foo<T, U>, the optimal ordering depends on the size of T and U. The same argument won't apply to Go until 1.18 is released.

staticassertion · on Jan 10, 2022

> How often is "reduce cache misses" that different from "use as little space as possible"?

I thought the same exact thing. If I can reduce my struct sizes I can pack N more structs into my cache line. That's almost certainly going to be the best cache-based win.

tedunangst · on Jan 10, 2022

If your struct is large enough that you care about shaving padding, you probably have hot fields and cold fields and the best cached based win will be arranging them contiguously.

scottlamb · on Jan 10, 2022

Counterexample: the struct in the article. It's "large enough that you care about shaving padding" in aggregate. The 2 optimal arrangements eliminate padding so it uses 1/16th of an x86-64 cache line. This guarantees the whole struct is on one cache line and uses the least total cache, so the 2 arrangements with padding are strictly worse, regardless of which fields are hot or cold.

A struct of arrays form may be better still, but that's more than rearranging fields and beyond the compiler to produce.

More generally, there are surely cases where it's better to arrange all the hot fields together at the expense of increasing the struct size, but I'm unsure it's the common case, and my bullshit alarm fires when people say so without evidence. (Even more so if they're also saying programmers are commonly arranging the fields optimally by hand.)

jeltz · on Jan 10, 2022

That does not match my real world experience. If you have an array of small structs then the padding matters. And if some fields are hot while others are cold you usually want to split the array of structs into two arrays of structs, one with the cold part and one with the hot. The use case Go optimizes for is not common in the fields I have been working in.

I think the default should be to minimize size and that it should be possible to opt out for the rare case where exact order matters.

mcherm · on Jan 10, 2022

Actually, you care about shaving padding when the FIELDS are small compared to the alignment, not the structs. This frequently is a concern in Go.

anonymoushn · on Jan 10, 2022

Most structs that care about shaving padding will be very small.

pornel · on Jan 10, 2022

If you have a large-enough group of distinctly cold fields, a better option may be to move them to another struct, possibly allocated out of line.

usrusr · on Jan 10, 2022

Can go structs be nested by value? (As opposed to by reference)

I'd imagine that this could be a very practical way to to do a manual hot/cold grouping in an environment that prioritizes packing over keeping order. (if the cold ones are particularly cold you might of course prefer them to be in a by reference nested struct anyways, so the hot subset packs better with their peers in an array)

morelisp · on Jan 10, 2022

If I have a [10000]T I need to shave very little padding before I see some impact, even though both the padded and unpadded T might be relatively small.

Specifically in Go, a smaller size can get even lone values into a smaller size class. Saving one byte may save you 384 if it's from 2305 to 2304.

fredophile · on Jan 10, 2022

The example in the link had 3 fields and padding made the structure 50% bigger than it needed to be. That probably likely big enough to have hot and cold fields but clearly they cared about shaving padding.

jlrubin · on Jan 10, 2022

it's very likely that it's one of the cases that when reducing cache misses does really matter it's very different from using as little space as possible, and the degree to which it matter dwarfs the degree to which it's not a frequent concern (i.e. fat tail).

for example, if I have a struct that contains a bunch of atomic fields, I may actually want to control the layout to ensure they are far apart (even inserting padding) to prevent e.g. false sharing https://en.wikipedia.org/wiki/False_sharing.

masklinn · on Jan 10, 2022

May is very much the operative word here.

The common and normal behaviour you want is to minimise struct size so you can fit the maximum instances in cache and memory.

The need for more precise control is very much the exception, and thus not unlike preventing inlining (which you may actually want) it could (and should) be an opt-out.

jlrubin · on Jan 11, 2022

my point is that these are the fat tail + survivor bias of cases where you do actually care about performance to this degree, so it probably is actually more common in practice when you're looking into it.

even though precise control is the exception, if you can't do it, you can't use your language in a lot of critical contexts (and end up linking C, Zig, Rust, etc).

masklinn · on Jan 11, 2022

…

Nobody said you should not be able to do it, at all? The argument is about default behaviour.

Rust is specifically a langage which reorders by default.

jlrubin · on Jan 11, 2022

scottlamb seemed to be implying it's not worth being able to since it's an uncommon use case, so I was explaining why even if it's uncommon it's must-have.

Symmetry · on Jan 10, 2022

Yeah, a cache line is 64 bytes. If you care about putting data that's used together adjacent you're assuming that objects a lot larger than 64 bytes are the most common case but I'm not at all sure that's true. I would guess that we're using up our memory with large heap allocated arrays and large numbers of structs that fit within a cache line. But of course, that's something that ought to be actually measured.

mhh__ · on Jan 10, 2022

If you need to pack the structs for a certain cache optimization you almost definitely should be grouping the data together differently.

nspattak · on Jan 10, 2022

struct packing is one way to reduce cache misses. One can also reduce cache misses by redesigning structs to better match the data usage pattern (eg ArraysOfStructs vs StructsOfArrays).

jcranmer · on Jan 10, 2022

This is kind of a fake answer to the problem.

First off, using less memory is an effective way to reduce cache misses: if you shrink memory by ⅓, that allows you 50% more objects in the same cache size. And this applies to anything--it's the only way to reduce cache misses that is universal. So saying that it's not solving the "real" problem is really a spit-take, because it's a pretty effective way of solving that "real" problem.

Suppose you considered cases where a smart ordering could avoid hitting unused cache lines. If a struct is larger than a cache line, it's possible to put co-used values on one cache line and avoid bringing in the other cache lines. But this kind of optimization isn't going to work unless the struct is cache-aligned to begin with--otherwise, your clever ordering is only going to sometimes work and sometimes potentially cause unnecessary multiple cache lines to need to be brought in. As to whether or not cacheline-alignment is a good idea, well, the extra padding will increase memory usage (see point #1), and the potential benefit is going to be limited by how hot or cold field accesses actually are.

The other case that comes to mind is false-sharing, which is definitely a real concern. Except, we're talking about reordering struct fields, which means it's false sharing within fields of a struct, and that's a much smaller subset of where false sharing actually occurs--false sharing tends to be more of an issue when you have an array of objects, and you need to make the struct element a multiple of cacheline size to avoid it. The only reasonable cases I can think of off the top of my head are going to involve structs which have intrusive atomic reference counting or some sort of intrusive lock in them--and you can solve both of those cases by making large cacheline-sized versions of those structs that prevent any fields of the outer struct from being stuck on the same cache lines as those data structures.

So I rather expect that it is very possible to have a field reordering algorithm that would improve cache misses in all the obvious cases (as I mentioned in point #1) while not preventing the user from having sufficient control to optimize for minimizing cache misses in the rarer cases in the subsequent point.

rsecora · on Jan 10, 2022

Its not only due to cache misses. If the memory access is not aligned a bus fault will be raised in a lot of CPU architectures.

Thats the 70s problem described in the documentation

beltsazar · on Jan 10, 2022

He was basically saying A is an old problem, the new problem is B. However, there's no obvious way to solve B, so we don't solve B.

But why not solve A then? Unless.. A is not a problem anymore nowadays. (Is it, though?)

But if A is not a problem anymore, he could have just said struct field ordering was an old problem and not a problem anymore in 2010s, without mentioning the other problem.

Meanwhile the blog post suggests that struct field ordering is still a problem even in 2020s.

bsuvc · on Jan 10, 2022

If Go tries to automatically solve A, it prevents a developer from solving B because there is no way to control field order.

By Go doing nothing, a developer can manually solve both A and B.

cafxx · on Jan 10, 2022

This is a strawman argument. It presents the two options as mutually incompatible, whereas they are not (as demonstrated by other languages that allow to do both).

Someone · on Jan 10, 2022

But if I understand things correctly, go, the spec, doesn’t provide the developer a way to control field order. It’s only the (main) implementation that currently happens to keep field order matching the order in the source code.

So, currently, there is no way to force memory layout, other than, as https://github.com/golang/go/issues/10014#issuecomment-26346... says

“defining the type using [n]byte and modifying the fields using the encoding/binary package. On many processors the performance will be approximately the same.”

Seems unsatisfactory to me.

usrbinbash · on Jan 10, 2022

> But if I understand things correctly, go, the spec, doesn’t provide the developer a way to control field order.

The spec doesn't, but the actual compiler implementation does. If the compiler is not allowed to re-order fields, they remain in the order the programmer specified in the code. That's why the fix in the article works in the first place.

Someone · on Jan 10, 2022

But the compiler isn’t “not allowed to re-order fields”. The spec allows it to do that. This fix works for the current compiler, but may not do so for next week’s.

Technically you could say it wouldn’t even be a breaking change if a compiler update changed that (practically people would have reason to be angry if they slipped something like that in without creating a major version and adding warning flags to the release notes)

adwn · on Jan 10, 2022

> By Go doing nothing, a developer can manually solve both A and B.

In Rust, you can annotate a struct declaration with "#[repr(C)]" to prevent the compiler from reordering fields. I don't see why the Go compiler couldn't offer something similar.

yashap · on Jan 10, 2022

Go’s philosophy seems to be a very minimal/low-feature language, that’s fairly low level, even if that means somewhat more awkward or mistake-prone code. Go’s current approach seems more Go-y than an annotation like this.

masklinn · on Jan 10, 2022

> Go’s philosophy seems to be a very minimal/low-feature language, that’s fairly low level, even if that means somewhat more awkward or mistake-prone code.

And that’s why it tells you to fuck off when you have an unused variable, a non-solution to a non-problem.

geodel · on Jan 10, 2022

Yeah, keeps the code cleaner, no unwanted variable and imports as I have tonnes of these in my Java codebase. One can use IDEs to fix it but I am not gonna generate 1000 files changed PR for this and apparently neither did last 5-6 people who worked this project.

Maybe you have not worked in these enterprise type projects where Java/Go is most likely used. In these places anything that is not build failure / compiler error is not an issue to be fixed now or ever.

masklinn · on Jan 10, 2022

> Yeah, keeps the code cleaner

You may want to take context in account when reading comments, they don’t existing in a blank void.

> One can use IDEs to fix it but I am not gonna generate 1000 files changed PR for this and apparently neither did last 5-6 people who worked this project.

That really has nothing to do with the subject at hand, and linters exist (in fact for most actual errors go requires a linter because the compiler is so anemic); for projects which are already in the pits, incremental linting is a thing (where linter errors are only a hard CI failure when they’re part of the PR’s diff).

geodel · on Jan 10, 2022

> You may want to take context in account when reading comments, they don’t existing in a blank void.

Agree.

But you do not have to ever understand others point of view because your opinion is a fact and absolute must on any programing language design.

xh-dude · on Jan 10, 2022

Yeah - the tradeoff isn’t about the benefits in places where macros would solve problems, but the problem of having macros. It’s an opinion.

previx · on Jan 10, 2022

An annotation in Rust isn't a macro. The Go equivalent would be a directive in a comment. For example: //go:repr c

https://pkg.go.dev/cmd/compile#hdr-Compiler_Directives

xh-dude · on Jan 10, 2022

Is the situation for Rust different than as described here: https://github.com/rust-lang/rfcs/pull/208

“Currently, attributes and macros/syntax extensions are both conceptually macros: they are user-definable syntactic extensions that transform token trees to token trees.”

FWIW I definitely admit the point about Go compiler directives sort of cheating into this space a bit - I see them used responsibly for the most part but it’s a valid point. (Go’s struct tags OTOH, are an escape hatch I revile…).

adwn · on Jan 10, 2022

> Is the situation for Rust different than as described here: https://github.com/rust-lang/rfcs/pull/208

In short: yes. The wording in that quote is imprecise: I suspect that by 'attributes', pcwalton was referring to user-definable attributes like #[serde(rename_all = "...")] and to custom derive macros like #[derive(Serializable)]. It is impossible to achieve the effects of the #[repr(C)] attribute using macros or token tree transformations.

previx · on Jan 10, 2022

If you're asking if macros and attributes are interchangeable terms, no they're not.

adwn · on Jan 10, 2022

> very minimal/low-feature language, that’s fairly low level, even if that means somewhat more awkward or mistake-prone code

Do you think this is a good trade-off?

If you're using a tool every day or at least regularly, the one-time cost of mastering a slightly more complex tool is amortized over all uses of that tool.

rob74 · on Jan 10, 2022

You may be using the tool every day, but are you using this "disable field reordering" annotation every day? Also, after you used it, you may remember what "#[repr(C)]" does, but I bet most of the other developers working with the code in the future will have to look it up (unless you leave a comment). So the trade-off is not as cheap as you think...

yashap · on Jan 10, 2022

I didn’t pass any judgement on whether it’s a good or bad tradeoff, just that it’s in-line with Go’s general approach to language design.

remus · on Jan 10, 2022

You could add something like that, but I suspect the new default behaviour would then break a number of programs (in particular things that use reflect and unsafe). There aren't many guarantees around this kind of usage but in practice the go devs are pretty reluctant to make changes that are known to break things without a very good reason, even in things where there's not a formal compatability promise.

dllthomas · on Jan 10, 2022

The solution to A will break manual attempts to address B.

masklinn · on Jan 10, 2022

The solution to A being opt-out makes addressing B not an issue.

dllthomas · on Jan 10, 2022

Yeah, for sure. I didn't mean to imply that it wraps up the whole argument, merely that it was the step in the original reasoning that the parent was missing.

adonovan · on Jan 10, 2022

Look up “false sharing”, the situation where two goroutines that access disjoint sets of fields contend for the same cache line.

Compilers are not smart enough to detect it, but if they lay out struct fields in declaration order, the programmer has a way to avoid the problem: by putting the two threads’ fields far apart.

berkut · on Jan 10, 2022

Yeah, quite often in HPC you very much do want to be able to control the alignment and packing (even though it's annoying to do), because you'll get caching issues otherwise, depending on the access patterns and size of the data.

tjoff · on Jan 10, 2022

It seems to follow that you shouldn't use Go in HPC. Which is also fine.

Ar-Curunir · on Jan 10, 2022

Why not just offer the option to have manual control? That's what Rust does; by default rustc reorders fields, but you can also manually annotate your struct with #[repr(C)], which enforces no reordering.

jerf · on Jan 10, 2022

Because in practice the Go solution is the other way around: You can optionally decide to worry about it by turning on the linter that detects this. It's not a very different solution in practice.

I actually kind of like the idea of a minimal language with a robust linting/static analysis community. You can modularly pick what things you want to worry about. The correct answer for the vast bulk of Go programmers is not to worry about this, and the ones who want to, the tools are readily available and already integrated into a tool that anyone writing serious Go code should already be using.

kibwen · on Jan 10, 2022

> It's not a very different solution in practice.

It makes a difference in the presence of generics, since at that point laying out fields efficiently in the face of every possible combination of types is a task that only the compiler can perform. There's nothing that says that it needs to be the default behavior, but if you want efficient space usage then you need more than a lint, you need some way to enable automatic field reordering.

jerf · on Jan 10, 2022

Fair enough, I don't think of generics as quite existing yet so I haven't started considering them yet.

I suspect in practice we're not going to see a lot of structs with a bajillion generics in them, though. Generics are going to solve the problems the Go developers said they will solve but there's still just enough friction in them (particularly the inability to introduce new types in methods) that I expect it will not be practical to create C++-like libraries of generic things that take generics that take generics as arguments, and in practice, "stick the small number of generic things (most likely one) at the end of the struct" will mostly cover the bases.

(I have no problem saying that if you need the n'th degree in performance, you shouldn't have picked Go. I think it has a great bang-for-the-buck ratio, but it definitely does not occupy the "best possible performance" slot.)

kibwen · on Jan 10, 2022

> I suspect in practice we're not going to see a lot of structs with a bajillion generics in them

Sure, but note that it only takes a single generic parameter to exhibit this behavior. Consider the original struct definition in the OP: if we imagine that the first field was generic instead of uint8, then the struct has padding only when the type is less than 16 bits in size. No matter where you manually reorder that field, some possible types will still result in padding if the fields are forced to be laid out in order, and it took no more than a single type parameter.

roca · on Jan 10, 2022

Yeah. Automatically optimizing for space gives good cache footprint if you don't know anything about access patterns so it might as well be the default.

Also, with generics, there are cases where the optimal field ordering depends on the generic type parameters, so letting the compiler reorder fields on a per-generic-instance basis gives better space utilization than any given source field ordering.

WatchDog · on Jan 10, 2022

Such a feature would probably require golang to either implement annotations or a new keyword. I'm not sure golang doesn't have annotations, but probably for the same reasons it's taken so long to implement generics, it's a language stuck in the 70's, with a user base and leadership that will fight tooth and nail to try and stay that way.

Kon-Peki · on Jan 10, 2022

Go most definitely allows annotations on structure fields [1]. You say the language is stuck in the 70s, but the GP has a quote from the Go authors saying that having the compiler reorder structure fields is a 1970s problem...

Well anyway, mostly it just sounds like the typical Rust enthusiast "How dare you have a strongly-held opinion that differs from my strongly-held opinion". As someone who used to work on a codebase that had to be portable across X86, Alpha, SPARC, and Itanium, paying attention to structure field order quickly becomes a routine matter. It hardly seems worthy of an argument over the merits of a programming language.

[1] https://pkg.go.dev/reflect#StructTag

adwn · on Jan 10, 2022

> Well anyway, mostly it just sounds like the typical Rust enthusiast "How dare you have a strongly-held opinion that differs from my strongly-held opinion"

To the contrary, it sounds like "Here's a solution that gives better results in 99% of cases, for reasons X, Y, Z, and better control and guarantees in those other 1% of cases."

rsecora · on Jan 10, 2022

Its not really a 70s problem, its a solution to a problem if the 1970s. RISC machines raised bus error if not properly aligned to avoid "double" access to the same value.

https://en.wikipedia.org/wiki/Bus_error#Unaligned_access

Ar-Curunir · on Jan 10, 2022

No need to be defensive, I was merely suggesting one approach, and didn't even criticize Go anywhere in my comment.

masklinn · on Jan 10, 2022

> I'm not sure golang doesn't have annotations

It does, the `//go:` stricture is a pragma / annotation.

And though I don’t remember any being at the struct-definition level, Go 1.16 added one at the “const” (toplevel var) level so adding it for structs doesn’t seem like an issue.

geodel · on Jan 10, 2022

> with a user base and leadership that will fight tooth and nail to try and stay that way.

This is hilarious. If majority users and authors of language all like same thing then for whom this problem is to be fixed in general? Is it for non-users of Go?

olliej · on Jan 10, 2022

Memory usage is a real problem once you have large amounts of data, of course Go is GC’d and so has a substantial memory hit anyway so I understand weighing that aspect less.

The real killer once that’s factored out is cache performance, and that really is a killer: for high performance code you can easily lose double digit %s of perf hit. You can do even worse in, but in the optimal case (a flat array) load predictors and prefetchers get you to only 10-20% hit from cache pressure.

This is my recollection from maybe 5 years ago (hell maybe even 10), so it could be worse now.

jbub · on Jan 10, 2022

The standard way to check for these was https://github.com/mdempsky/maligned.

It is now deprecated in favour of https://pkg.go.dev/golang.org/x/tools/go/analysis/passes/fie....

You can now check for these using go vet:

    go install golang.org/x/tools/go/analysis/passes/fieldalignment/cmd/fieldalignment@latest
    go vet -vettool=$(which fieldalignment) ./...

aneutron · on Jan 10, 2022

Thanks ! That's nice to know !

rsecora · on Jan 10, 2022

This post shows how knowledge is lost with the past of time.

>> Modern CPU hardware performs reads and writes to memory most efficiently when the data is naturally aligned.

Is not only "modern" CPUs, every RISC (80s, 40 years ago) mandate word alignment. In fact, the program will receive a bus fault if not aligned. The compilers pad between fields to ensure alignment is right. That’s also why "packed" structures can be defined in some languages.

>> This was is a weird quirk

Not really, a lot of code has been programed that way since I remember (80s).

oofabz · on Jan 10, 2022

There are tools for Go that detect non-optimal struct alignment. I use golangci-lint, available at: https://github.com/golangci/golangci-lint

The struct alignment linter is not included by default. To enable it, run the linter with this command: golangci-lint run --enable maligned

dgellow · on Jan 10, 2022

If you, the person reading this comment, write Go without golangci-lint, you’re really missing something.

It’s fast, easy to use, and makes Go programming so much nicer and safer!

bstpierre · on Jan 10, 2022

Also, don’t let the name mislead you — it’s not just for CI jobs, it’s great to use all the time.

everybodyknows · on Jan 10, 2022

> go get installation aren't guaranteed to work. We recommend using binary installation.

The page goes on to list several difficulties with go get of the sort that suggest fundamental design flaws. Anyone have experience with this?

icholy · on Jan 10, 2022

https://github.com/golang/go/issues/44840

nhoughto · on Jan 10, 2022

Reminds me of a similar potential gotcha in Postgres

https://www.2ndquadrant.com/en/blog/on-rocks-and-sand/

The linting story for postgres schemas isn't as good as it is in golang tho, so harder to automatically detect/solve for.

WesternWind · on Jan 10, 2022

Seems like a prime candidate for a linter to warn about, optionally.

christophilus · on Jan 10, 2022

Yep. It’s baked into the Go linter, which seems like a good enough solution to me.

meling · on Jan 10, 2022

This can help: https://github.com/mdempsky/maligned

AtNightWeCode · on Jan 10, 2022

There is a linter for this I believe. There can never be an automated fix for this since code can depend on the memory layout.

Anyway, this is common in programming. At least it was not 4 bytes for the alignment.

Things like this also impact performance. We had a project where we lost both memory and speed after migrating to C++. Turned out to be the virtual destructor that was the culprit.

mariusor · on Jan 10, 2022

> There can never be an automated fix for this since code can depend on the memory layout.

I think that's very specifically discouraged in Go, unless there's a way to do it without using the "unsafe" package that I don't know of:

> Package unsafe contains operations that step around the type safety of Go programs.

> Packages that import unsafe may be non-portable and are not protected by the Go 1 compatibility guidelines.

AtNightWeCode · on Jan 10, 2022

I would not be surprised if somethings may break if there is a discrepancy between the structs and the memory layout. Stranger things have been seen. If people followed best practices, there would not be a problem in the first place. How to work with and check memory alignment is very much a known area. …and I do believe there is linter for this in Go as well.

Jonas_ba · on Jan 10, 2022

Go doesnt do this by default, but there are some analyzers that help you do this - https://github.com/orijtech/structslop

alkonaut · on Jan 10, 2022

Having manual/explicit struct layout obviously allows solving this, but the usual reason languages (at any level) have them is because it’s a nightmare to do C-Style interop without it. What’s the alternative here?

epage · on Jan 10, 2022

Rust reorders struct layout as it wishes unless you opt-in to a C-like layout, so you only pay for it when you need it.

alkonaut · on Jan 10, 2022

Yes, that's how I thought it worked in most languages (that have structs, and allow C-interop). You leave it open for the compiler to do anything/nothing unless you do e.g. repr(C) (In rust) or [StructLayout(LayoutKind.Explicit)] or similar (C#).

I wouldn't want this to be explicit in the spec but nor would I manage without having the back door. In languages that don't have "structs" at all, it's obviously not as necessary because you'll be forced to jump through hoops anyway (JNI, for example).

Too · on Jan 10, 2022

Arguably, if you are operating at such large data sets you likely benefit even more from struct-of-arrays instead of array-of-structs.

This is of course a much bigger change than just moving one line in the struct definition.

post-factum · on Jan 10, 2022

Reminds me of http://www.catb.org/esr/structure-packing/

mytailorisrich · on Jan 10, 2022

This is a good optimisation, but I can't help wondering why such a long article for something trivial and well-known for about as long as people have been programming.

jeffreyrogers · on Jan 10, 2022

Most programmers don't think about memory at this level of granularity unless they've learned C or C++ or written assembly.

pjmlp · on Jan 10, 2022

When I started programming that applied to most programming languages even BASIC, ironically how things change.

bee_rider · on Jan 10, 2022

It is about 10 paragraphs long, and them some code samples. And they explain it at a level that is accessible to people who don't come from a low level background. Seems pretty good to me.

rahulpadalkar · on Jan 10, 2022

As a web dev i found this article rather insightful.

hresvelgr · on Jan 10, 2022

Isn't reordering struct members something that compilers should do automatically? Are there scenarios where reordering produces undefined behaviour?

fulafel · on Jan 10, 2022

Yes, they should. Also many other kinds of data representation optimizations whwre there are lots of fruit to pick. Unfortunately this rarely happens.

Language specs rarely support it well, so much of the blame is on language designers. But it's a somewhat chicken and egg situation. Language users don't demand it either because they never had it in other languages.

pjmlp · on Jan 10, 2022

Doing it automatically messes up ABI and binary libraries infrastructure.

There is more to it than lazy language designers.

masklinn · on Jan 10, 2022

> Doing it automatically messes up ABI and binary libraries infrastructure.

Neither is an issue in Go, since everything’s statically compiled and it doesn’t make any ABI guarantee.

Making this opt-out (or making precise layout opt-in) actually improves the situation there, because then you have clear, explicit guarantees.

pjmlp · on Jan 10, 2022

Apparently people keep missing the news that Go nowadays also does dynamic linking.

Joker_vD · on Jan 10, 2022

It does? Could you please point to a good introduction/summary? I am actually quite interested to see how they manage the ABI compatibility compared to e.g. Swift.

pjmlp · on Jan 10, 2022

Swift is one of the few AOT compiled languages where they went the extra mile to ensure some kind of ABI compatibility across language versions, Go just did the usual stuff and no guarantees are given.

As for the documentation,

Check -dynlink and -shared on "go compile"

https://pkg.go.dev/cmd/compile@go1.17.6

Also -buildmode and -shared on "go link"

https://pkg.go.dev/cmd/link#hdr-Command_Line

You can either create a dynamic linked package, that will dynamically link with other Go compiled code (from same toolchain), or expose a C ABI from a Go compiled .so (which may or may not include the runtime as well).

As for one possible example,

https://www.ardanlabs.com/blog/2020/07/extending-python-with...

Joker_vD · on Jan 10, 2022

Huh. So it's basically undocumented w.r.t. the actual implementation details, I was mostly interested in how they manage to bolt interfaces onto structs when they're separately compiled: e.g. if libFoo.so has "struct Foo {}" and "func (x Foo) Frob() { ... }" in it, and if libFrob.so has "interface Frobber { func Frob() }", and then they're linked together to the main application, will Foo actually implement Frobber? If yes, how?

pjmlp · on Jan 10, 2022

I guess that is the whole point of not having a stable ABI, you also won't find much documentation how C++ compilers do it, beyond reading the code.

fulafel · on Jan 10, 2022

Yes, JIT runtimes have an easier time here (and can more easily do feedback based optimizations on access patterns). For AOT you can do LTO / whole program opt and punt on calls to outside libs, plus provide generics style specializable library code that is not AOT compiled.

I submit it's mostly solvable in language design. They're not all lazy of course, but claiming to be very performance-oriented is a half truth as long as you don't have a strong story here.

pjmlp · on Jan 10, 2022

That doesn't work in environments based on shared libraries or OS IPC component models.

onphonenow · on Jan 10, 2022

per https://jonasdevlieghere.com/order-your-members/

Smaller size is good for cache, but other factors matter

From the above

Single-Threaded Environment

When a single thread on a single core is accessing the data in a struct, we can improve caching performance by using as little cache lines as possible.

By optimizing for memory footprint, as discussed in the previous section, the struct uses less memory and hence occupies less space in the cache.

By placing heavily used members close together, we hope (based on the locality of reference principle) that they will end up closer together in the cache, preferably even on the same cache line, and hence use less cache space.

By separating hot fields form cold ones, we reduce the amount of cache lines filled with unused data.

berkut · on Jan 10, 2022

Not undefined behaviour, but you can get "false sharing" where data from structs/member variables ends up in particular cache lines depending on their address (and the -way of the caches of the processors). Depending on the algorithm and the access patterns it's relatively common (at leasts in HPC space in my experience) to get overhead / inefficiencies due to false sharing or other aliasing issues (4k aliasing on Intel CPUs, although that's arguably a slightly different issue), due to cache line 'collisions' of different member variables being written to in the caches in sub-optimal ways due to the algorthm.

So for HPC code, you very much do want the ability to re-order them manually.

olliej · on Jan 10, 2022

Automatic reordering makes maintaining ABI compatibility either very hard (how do you add a field to a struct if you don’t know where the compiler will put it?), or slow (objc supports field reordering, etc but it results in a lot of indirection when you access a field from outside the class’s implementation section)

C, C++, Pascal all define the order explicitly, the platform ABI generally defines the padding rules.

fulafel · on Jan 10, 2022

Most languages don't have stable native ABIs.

olliej · on Jan 10, 2022

What? Define “most” here, because every actual system language has a stable ABI, otherwise they can’t be used for system libraries.

If you include toy languages, you might get there.

Even languages like Go and Rust recognize that at API boundaries you need a stable and defined ABI.

fulafel · on Jan 11, 2022

Most of today's apps languages, for example. But also Go documents that there's only a internal ABI that's not stable across versions, so it doesn't serve for most ABIy things, like system libraries. Rust seems to be similar. I wonder what the situation is like in GPU language land...

These languages still allow you to consume and provide C compatible ABIs explicitly but this does not interfere with data optimizations for native data.

olliej · on Jan 12, 2022

Again, provide an actual example. Many “modern” languages don’t ship in compiled form so the important part is mostly just API.

Go and rust can’t be used for system libraries: you have to create C interface. This means that if you have two libraries, both written in rust then they have to communicate through a C layer.

The alternative (what rust does) is to have every application contain a complete copy of every library it uses, which is horrific for performance.

fulafel · on Jan 12, 2022

Modern applications languages: Clojure/ClojureScript, Java, F#, C#, Visual Basic, Python, Javascript, TypeScript, PHP, Swift, Dart, Go (included even if it's labeled as a systems language). Of those only Swift does stable ABIs.

Yep, not shipping (native) compiled code is usually how you end up with no ABIs. But these these languages and runtimes still don't really support optimizations of data structures very well, I think it's largely because they weren't specified and implementerd to do it from the start and now there are all kinds of ingrained things about the semantics and estsabilished implementations and user expectations that get in the way of doing big things like feedback based rewriting of data layouts.

rsecora · on Jan 10, 2022

Underrated comment

emmanueloga_ · on Jan 10, 2022

why not? This kind of thing should be documented somewhere. https://xkcd.com/1053/ , etc.

jhgb · on Jan 10, 2022

Surely this issue is already explained in introductory programming books. Or at least it used to be, from what I recall.

japhib · on Jan 10, 2022

I wonder if you also get a couple of wasted bytes between structs to achieve word alignment on an 8-byte boundary?

In which case you’d be saving HALF your memory instead of just a third.

The_rationalist · on Jan 10, 2022

How is the situation with Java?