I understand the decision to archive the upstream repo; as of when I left Meta, we (i.e. the Jemalloc team) weren’t really in a great place to respond to all the random GitHub issues people would file (my favorite was the time someone filed an issue because our test suite didn’t pass on Itanium lol). Still, it makes me sad to see. Jemalloc is still IMO the best-performing general-purpose malloc implementation that’s easily usable; TCMalloc is great, but is an absolute nightmare to use if you’re not using bazel (this has become slightly less true now that bazel 7.4.0 added cc_static_library so at least you can somewhat easily export a static library, but broadly speaking the point still stands).
I’ve been meaning to ask Qi if he’d be open to cutting a final 6.0 release on the repo before re-archiving.
At the same time it’d be nice to modernize the default settings for the final release. Disabling the (somewhat confusingly backwardly-named) “cache oblivious” setting by default so that the 16 KiB size-class isn’t bloated to 20 KiB would be a major improvement. This isn’t to disparage your (i.e. Jason’s) original choice here; IIRC when I last talked to Qi and David about this they made the point that at the time you chose this default, typical TLB associativity was much lower than it is now. On a similar note, increasing the default “page size” from 4 KiB to something larger (probably 16 KiB), which would correspondingly increase the large size-class cutoff (i.e. the point at which the allocator switches from placing multiple allocations onto a slab, to backing individual allocations with their own extent directly) from 16 KiB up to 64 KiB would be pretty impactful. One of the last things I looked at before leaving Meta was making this change internally for major services, as it was worth a several percent CPU improvement (at the cost of a minor increase in RAM usage due to increased fragmentation). There’s a few other things I’d tweak (e.g. switching the default setting of metadata_thp from “disabled” to “auto”, changing the extent-sizing for slabs from using the nearest exact multiple of the page size that fits the size-class to instead allowing ~1% guaranteed wasted space in exchange for reducing fragmentation), but the aforementioned settings are the biggest ones.
Ah, porting to HP Superdome servers. It’s like being handed a brochure describing the intricate details of the iceberg the ship you just boarded is about to hit in a few days.
Still makes me sad. I partially think a major reason for the demise was that it was simply constructed too soon. Compiler tech wasn't nearly good enough to handle the ISA.
Nowadays because of the efforts that have gone in to making SIMD effective, I'd think modern compilers would have an easier time taking advantage of that unique and strange uarch.
VLIW has a fatal flaw in how it was used in these systems. You cannot run general purpose dynamically scheduled workloads unless you combine the JIT engine and the scheduler. PRIOR ART. Which is the same exact problem of trying to run multiple compute kernels on a GPU at the same time. VLIW with an OS and runtime that uses a higher level language, Wasm or the JVM, could forseably support dynamic workloads where the main cpu was VLIW.
Now if they had been designed as GPU like devices for processing data, then Fortune 1000 would have never needed or used Hadoop.
It’s just a huge pain to build and link against. Before the bazel 7.4.0 change your options were basically:
1. Use it as a dynamically linked library. This is not great because you’re taking at a minimum the performance hit of going through the PLT for every call. The forfeited performance is even larger if you compare against statically linking with LTO (i.e. so that you can inline calls to malloc, get the benefit of FDO , etc.). Not to mention all the deployment headaches associated with shared libraries.
2. Painfully manually create a static library. I’ve done this, it’s awful; especially if you want to go the extra mile to capture as much performance as possible and at least get partial LTO (i.e. of TCMalloc independent of your application code, compiling all of TCMalloc’s compilation units together to create a single object file).
When I was at Meta I imported TCMalloc to benchmark against (to highlight areas where we could do better in Jemalloc) by pain-stakingly hand-translating its bazel BUILD files to buck2 because there was legitimately no better option.
As a consequence of being so hard to use outside of Google, TCMalloc has many more unexpected (sometimes problematic) behaviors than Jemalloc when used as a general purpose allocator in other environments (e.g. it basically assumes that you are using a certain set of Linux configuration options [1] and behaves rather poorly if you’re not)
As I observed when I was at Google: tcmalloc wasn't a dedicated team but a project driven by server performance optimization engineers aiming to improve performance of important internal servers.
Extracting it to github.com/google/tcmalloc was complex due to intricate dependencies (https://abseil.io/blog/20200212-tcmalloc ). As internal performance priorities demanded more focus, less time was available for maintaining the CMake build system.
Maintaining the repo could at best be described as a community contribution activity.
> Meta’s needs stopped aligning well with those of external uses some time ago, and they are better off doing their own thing.
I think Google's diverged from the external uses even long ago:) (For a long time google3 and gperftools's tcmalloc implementations were so different.)
Everything from Google is an absolute pain to work with unless you're in Google using their systems, FWIW. Anything from the Chromium project is deeply intangled with everything else from the Chromium project as part of one gigantic Chromium source tree with all dependencies and toolchains vendored. They do not care about ABI what so ever, to the point that a lot of Google libraries change their public ABI based on whether address sanitizer is enabled or not, meaning you can't enable ASAN for your code if you use pre-built (e.g package manager provided) versions of their code. Their libraries also tend to break if you link against them from a project with RTTI enabled, a compiler set to a slightly different compiler version, or any number of other minute differences that most other developers don't let affect their ABI.
And if you try to build their libraries from source, that involves downloading tens of gigabytes of sysroots and toolchains and vendored dependencies.
Oh and you probably don't want multiple versions of a library in your binary, so be prepared to use Google's (probably outdated) version of whatever libraries they vendor.
And they make no effort what so ever to distinguish between public header files and their source code, so if you wanna package up their libraries, be prepared to make scripts to extract the headers you need (including headers from vendored dependencies), you can't just copy all of some 'include/' folder.
And their public headers tend to do idiotic stuff like `#include "base/pc.h"`, where that `"base/pc.h"` path is not relative to the file doing the include. So you're gonna have to pollute the include namespace. Make sure not to step on their toes! There's a lot of them.
I have had the misfortune of working with Abseill, their WebRTC library, their gRPC library and their protobuf library, and it's all terrible. For personal projects where I don't have a very, very good reason to use Google code, I try to avoid it like the plague. For professional projects where I've had to use libwebrtc, the only reasonable approach is to silo off libwebrtc into its own binary which only deals with WebRTC, typically with a line-delimited JSON protocol on stdin/stdout. For things like protobuf/gRPC where that hasn't been possible, you just have to live with the suffering.
..This comment should probably have been a blog post.
I think your rant isn't long enough to include everything relevant ;)
The Blink web engine (which I sometimes compile for qtwebengine) takes a really long time to compile, several times longer than Gecko according to some info I found online. Google has a policy of not using forward declarations, including everything instead. That's a pretty big WTF for anyone who has ever optimized build time. Google probably just throws hardware and (distributed) caching at the problem, not giving a shit about anyone else building it. Oh, it also needs about 2 GB of RAM per build thread - basically nothing else does.
Even with Firefox using Rust and requiring a build of many crates, qtwebengine takes more time. It was so bad that I had to remove packaged from my system (Gentoo) that were pulling qtwebengine.
And I build all Rust crates (including rustc) with -O3, same as C/C++.
That is really nice to hear, but AFAICS it only means that it may change in the future. Because in current code, it was ~all includes last time I checked.
Well, I remember one - very biased - example where I had a look at a class that was especially expensive to compile, like 40 seconds (on a Ryzen 7950X) and maybe 2 GB of RAM. It had under 200 LOC and didn't seem to do anything that's typically expensive to compile... except for the stuff it included. Which also didn't seem to do anything fancy. But transitive includes can snowball if you don't add any "compile firewalls".
> Because in current code, it was ~all includes last time I checked.
That's another matter - just because forward-declares are allowed, doesn't mean they are mandated, but in my experience the reviewers were paying attention to that pretty well.
I picked couple random headers from the directory where I've contributed the most to blink, and from what I'm seeing, most of the classes that could be forward-declared, were. I have not looked at .cc files given that those tend to need to see the declaration (except when it's unused, but then why have a forward-decl at all?) or the compiler would complain about access into incomplete type.
> Well, I remember one - very biased - example where I had a look at a class that was especially expensive to compile, like 40 seconds (on a Ryzen 7950X) and maybe 2 GB of RAM. It had under 200 LOC and didn't seem to do anything that's typically expensive to compile... except for the stuff it included.
Maybe the stuff was actually being compiled because of some member in a class (so it was actually expensive to compile). Or maybe you stumbled upon a place where folks weren't paying attention. Hard to say without a concrete example. The "compile firewall" was added pretty recently I think, but I don't know if it's going to block anything from landing.
Edit: formatting (switched bulleted list into comma-separated because clearly I don't know how to format it).
The annotated red dots correspond to the last time Chrome developers did a big push to prune the include graph to optimize build time. It was effective, but there was push back. C++ developers just want magic, they don't want to think about dependency management, and it's hard to blame them. But, at the end of the day, builds scale with sources times dependencies, and if you aren't disciplined, you can expect superlinear build times.
Good that it's being tracked, but Jesus, these numbers!
110 CPU hours for a build. (Fortunately, it seems to be a little over half that for my CPU. "Cloud CPUs" are kinda slow.)
I picked the 5001st largest file with includes. It's zoom_view_controller.cc, 140 lines in the .cc file, size with includes: 19.5 MB.
Initially I picked the 5000th largest file with includes, but for devtools_target_ui.cc, I see a bit more legitimacy for having lots of includes. It has 384 "own" lines in he .cc file and, of course, also about 19.5 MB size with includes.
A C++20 source file including some standard library headers easily bloats to a little under 1 MB IIRC, and that's already kind of unreasonable. 20x of that is very unreasonable.
I don't think that I need to tell anyone on the Chrome team how to improve performance in software: you measure and then you grab the dumb low-hanging fruit first. From these results, it doesn't seem like anyone is working with the actual goal to improve the situation as long as the guidelines are followed on paper.
> I picked the 5001st largest file with includes. It's zoom_view_controller.cc, 140 lines in the .cc file, size with includes: 19.5 MB.
> Initially I picked the 5000th largest file with includes, but for devtools_target_ui.cc, I see a bit more legitimacy for having lots of includes. It has 384 "own" lines in he .cc file and, of course, also about 19.5 MB size with includes.
> A C++20 source file including some standard library headers easily bloats to a little under 1 MB IIRC, and that's already kind of unreasonable. 20x of that is very unreasonable.
I think you're not arguing pro-forward-declarations vs anti-forward-declarations here though - it sounds more like an argument for more granular header/source files? In .cc file, each and every include should be necessary for the file to compile (although looking at your example, bind.h seems to be unused and could be removed - looks like the file was refactored and the includes weren't cleaned up).
With that said, in the corresponding zoom_view_controller.h, the tab_interface.h include looks to be unnecessary so you did find one good example. :)
Yes, sure! I am arguing for whatever is necessary to reduce the total compilation cost. Pruning headers, rearranging source code to have fewer trivial modules and to reduce the size of very often included headers, even gasp sometimes using pointers just to reduce compile time! I understand that runtime performance is a very high priority for Blink, but it really doesn't matter sometimes if certain things are heap-allocated. Like things that are very expensive to instantiate anyway and that don't occur often. These will incidentally tend to have "heavy" headers, too.
Reading this perspective was interesting. I can appreciate that things didn't fit into your workflow very well, but my experience has been the opposite. Their projects seem to be structured from the perspective of building literally everything from source on the spot. That matches my mindset - I choose to build from scratch in a network isolated environment. As a result google repos are some of the few that I can count on to be fairly easy to get up and running. An alarming number of projects apparently haven't been tested under such conditions and I'm forced to spend hours patching up cmake scripts. (Even worse are the projects that require 'npm install' as part of the build process. Absurdity.)
> Oh and you probably don't want multiple versions of a library in your binary, so be prepared to use Google's (probably outdated) version of whatever libraries they vendor.
This is the only complaint I can relate to. Sometimes they lag on rolling dependencies forward. Not so infrequently there are minor (or not so minor) issues when I try to do so myself and I don't want to waste time patching my dependencies up so I get stuck for a while until they get around to it. That said, usually rolling forward works without issue.
> if you try to build their libraries from source, that involves downloading tens of gigabytes of sysroots and toolchains and vendored dependencies.
Out of curiosity which project did you run into this with? That said, isn't the only alternative for them moving to something like nix? Otherwise how do you tightly specify the build environment?
I don't really have the care nor time to respond as thoroughly as you deserve, but here are some thoughts:
> Out of curiosity which project did you run into this with?
Their WebRTC library for the most part, but also the gRPC C++ library. Unlike WebRTC, grpc++ is in most package managers so the need to build it myself is less, but WebRTC is a behemoth and not in any package manager.
> That said, isn't the only alternative for them moving to something like nix? Otherwise how do you tightly specify the build environment?
I don't expect my libraries to tightly specify the build environment. I expect my libraries to conform to my software's build environment, to use versions of other libraries that I provide to it, etc etc. I don't mind that Google builds their application software the way they do, Google Chrome should tightly constrain its build environment if Google wants; but their libraries should fit in to my environment.
I'm wondering, what is your relationship with Google software that you build from source? Are you building their libraries to integrate with your own applications, or do you just build Google's applications from source and use them as-is?
Yeah fair enough, controlling the build environment probably ought to be optional. Sounds like I dodged the issues you ran into due to the combination of specific library plus usecase. My experience is limited to abseil as well as the full dawn stack. In all cases I'm statically linking into my own applications, building everything except glibc & co from source, network isolated environment, using the same toolchain, compiler flags, etc.
> I choose to build from scratch in a network isolated environment. As a result google repos are some of the few that I can count on to be fairly easy to get up and running.
If you are building a single google project they are easy to get up and running. If you are building your own project on top of theirs things get difficult. those library issues will get you.
I don't know about OP, but we have our own in house package manager. If Conan was ready a couple years sooner we would have used that instead.
I’ve hit similar problems with their Ruby gRPC library.
The counter example is the language Go. The team running Go has put considerable care and attention into making this project welcoming for developers to contribute, while still adhering to Google code contribution requirements. Building for source is straightforward and iirc it’s one of the easier cross compilers to setup.
I agree to a point. grpc++ (and protobuf and boringssl and abseil and....) was the biggest pain in the ass to integrate in to a personal project I've ever seen. I ended up having to write a custom tool to convert their Bazel files to the format my projects tend to use (GN and Ninja). Many hours wasted. There were no library specfici "sysroots" or "toolchains" involved though thankfully because I'm sure that would made things even worse.
Upside is (I guess) if I ever want to use grpc in another project the work's already done and it'll just be a matter of copy/paste.
> they make no effort what so ever to distinguish between public header files and their source code
They did, in a different way. The world is used to distinguish by convention, putting them in different directory hierarchy (src/, include/). google3 depends on the build system to do so, "which header file is public" is documented in BUILD files. You are then required to use their build system to grasp the difference :(
> And their public headers tend to do idiotic stuff like `#include "base/pc.h"`, where that `"base/pc.h"` path is not relative to the file doing the include.
I have to disagree on this one. Relying on relative include paths suck. Just having one `-I/project/root` is the way to go.
> I have to disagree on this one. Relying on relative include paths suck. Just having one `-I/project/root` is the way to go.
Oh to be clear, I'm not saying that they should've used relative includes. I'm complaining that they don't put their includes in their own namespace. If public headers were in a folder called `include/webrtc` as is the typical convention, and they all contained `#include <webrtc/base/pc.h>` or `#include "webrtc/base/pc.h"` I would've had no problem. But as it is, WebRTC's headers are in include paths which it's really difficult to avoid colliding with. You'll cause collisions if your project has a source directory called `api`, or `pc`, or `net`, or `media`, or a whole host of other common names.
Thanks for the clarification. Yeah, that's pretty frustrating.
Now I'm curious why grpc, webrtc and some other Chromium repos were set up like this.
Google projects which started in google3 and later exported as an open source project don't have this defect, for example tensorflow, abseil etc. They all had a top-level directory containing all their codes so it becomes `#include "tensorflow/...`.
Feels like a weird collision of coding style and starting a project outside of their monorepo
>> `#include "base/pc.h"`, where that `"base/pc.h"` path is not relative to the file doing the include.
> I have to disagree on this one.
The double-quotes literally mean "this dependency is relative to the current file". If you want to depend on a -I, then signal that by using angle brackets.
Eh, no. The quotes mean "this is not a dependency on a system library". Quotes can include relative to the files, or they can include things relative to directories specified with -I. The only thing they can't is include things relative to directories specified with -isystem and system include directories.
I would be surprised if I read some project's code where angle brackets are used to include headers from within the same project. I'm not surprised when quotes are used to include code from within the project but relative to the project's root.
The only difference between "" and <> is that the former adds the current file's directory to the beginning of the search path.
So the only reason to use "" instead of <> is when you need that behaviour, because the dependency is relative to the current file.
If you use "" in any other situation, then you are introducing a potential error, because now someone can change the meaning of your code simply by creating a file with a name and location that happens to match your dependency.
(Yes, some compilers have -isystem and -iquote which modify that behaviour, but those options are not standard, and can't be relied upon. I'd strongly advise against their use.)
I’ve successfully used LLMs to migrate Makefiles to bazel, more or less. I’ve not tried the reverse but suspect (2) isn’t so bad these days. YMMV, of course, but food for thought
Dunno why you got downvoted, but I've also tried to let Claude translate a bunch of BUILD files to equivalent CMakeLists.txt. It worked. The resulting CMakeLists.txt looks super terrible, but so is 95% of CMakeLists.txt in this world, so why bother, it's doomed anyway.
They got downvoted because 1) comments of the form "I gave a chat bot a toy example of a task and it managed it" are tired and uninformative, and 2) because nobody was talking about anything which would make translating a Makefile into Bazel somehow relevant, nobody here has a Makefile which we wish was Bazel, we wish Google code was easier to work with
The person above was saying they did a tedious manual port of tcmalloc to buck. Since tcmalloc provides both bazel and cmake builds, it seems relevant that in these days a person could have potentially forced a robot to do the job of writing the buck file given the cmake or bazel files.
People are discussing things that are tedious work. I think the conversion to Bazel from a makefile is much more tedious and error prone than the reverse, in part because of Bazel sandboxing although that shouldn’t make much of a difference for a well-defined collection of Makefiles of a C library.
The reverse should be much easier, which was the point of the post. Pointing it out as a capability (translation of build systems) that is handled well, is, well, informative. The future isn’t evenly distributed and people aren’t always aware of capabilities, even on HN
Yep I've done something similar. This is the only way I managed to compile Google's C++ S2 library (spatial indexing) which depends on absl and OpenSSL.
(I managed to avoid infecting my project with boringSSL)
I would love to see these changes - or even some sort of blog post or extended documentation explaining rational. As is the docs are somewhat barren. I feel that there’s a lot of knowledge that folks like you have right now from all of the work that was done internally at Meta that would be best shared now before it is lost.
The Itanium ISA was an infamous failure, never seeing widespread usage, hence people often referring to it as “The Itanic” (i.e. the much-touted ship that immediately sunk). The fact that anyone would be using it today at all is sort of hilariously niche, and is illustrative of how wide-ranging and obscure the issues filed to the GitHub repo could be. On a similar token I recall seeing an issue (or maybe it was a PR?) to fix our build on GNU Herd.
> we (i.e. the Jemalloc team) weren’t really in a great place to respond to all the random GitHub issues people would file
Why not? I mean this is complete drive-by comment, so please correct me, but there was a fully staffed team at Meta that maintained it, but was not in the best place to manage the issues?
Because you need to build it to use it, and you likely already have significant build related infrastructure, and you are going to need to integrate any new dependencies into that. I'm increasingly convinced that the various build systems are elaborate and wildly successful ploys intended only to sap developer time and energy.
Because you have to build it. If they don't use the same build system as you, you either want to invoke their system, or import it into yours. The former is unappealing if it's 'heavy' or doesn't play well as a subprocess; the latter can take a lot of time if the build process you're replicating is complex.
I've done both before, and seen libraries at various levels of complexity; there is definitely a point where you just want to give up and not use the thing when it's very complex.
This. When step one is "install our weird build system," I'll immediately look for something else that meets my needs. All build systems suck, so everyone thinks they can write a better one, and too many people try. Pretty soon you end up having to learn a majority of this (https://en.wikipedia.org/wiki/List_of_build_automation_softw...) to get your code to compile.
If TCMalloc uses bazel, then you build it with Bazel. It just needs to install itself where you tell it to, and then either it has given you a pkg-config file, or otherwise, your own build system needs some library-finding logic for it ("find module" in CMake terms). Or - are you saying the problem is that you need to install Bazel?
Building complete, optimised binaries can be much more complicated than just linking in a SO/A file. Things like cross-language LTO and PGO can be massive for performance and require integration throughout the build system.
> Or - are you saying the problem is that you need to install Bazel?
That. Then there's Facebook's build system, a few Python and JavaScript build systems, and pretty soon I have installed and have to deal with a half-dozen things that slightly improve upon Make. It's a maintenance burden if I ever have to touch any of these build systems.
I’ve been meaning to ask Qi if he’d be open to cutting a final 6.0 release on the repo before re-archiving.
At the same time it’d be nice to modernize the default settings for the final release. Disabling the (somewhat confusingly backwardly-named) “cache oblivious” setting by default so that the 16 KiB size-class isn’t bloated to 20 KiB would be a major improvement. This isn’t to disparage your (i.e. Jason’s) original choice here; IIRC when I last talked to Qi and David about this they made the point that at the time you chose this default, typical TLB associativity was much lower than it is now. On a similar note, increasing the default “page size” from 4 KiB to something larger (probably 16 KiB), which would correspondingly increase the large size-class cutoff (i.e. the point at which the allocator switches from placing multiple allocations onto a slab, to backing individual allocations with their own extent directly) from 16 KiB up to 64 KiB would be pretty impactful. One of the last things I looked at before leaving Meta was making this change internally for major services, as it was worth a several percent CPU improvement (at the cost of a minor increase in RAM usage due to increased fragmentation). There’s a few other things I’d tweak (e.g. switching the default setting of metadata_thp from “disabled” to “auto”, changing the extent-sizing for slabs from using the nearest exact multiple of the page size that fits the size-class to instead allowing ~1% guaranteed wasted space in exchange for reducing fragmentation), but the aforementioned settings are the biggest ones.