Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One of the worst offenders is probably the binary-trees benchmark.

1. It features an atypically high allocation rate compared to real-world programs.

2. As a synthetic micro benchmark that only ever allocates objects of one size, it makes pool allocators look disproportionately good. But in real-world applications with varying object sizes, overuse of pool allocators creates a serious risk for fragmentation.

3. The rules allow programs that use manual memory management to use pretty much any pool allocator library that they can get their hands on, but forbid garbage-collected languages from adjusting their GC parameters (last I checked, you could easily improve the performance of OCaml and Dart for that benchmark by a factor of 2-3 simply by adjusting GC settings, especially the size of the minor heap).



> It features an atypically high allocation rate compared to real-world programs.

Do you have any supporting evidence for this? Real-world programs can be pretty allocation heavy, and if anything high allocation rates would "unfairly" benefit GC languages since native programs are banned from using arena allocators.

> As a synthetic micro benchmark that only ever allocates objects of one size, it makes pool allocators look disproportionately good. But in real-world applications with varying object size, overuse of pool allocators is a serious risk for fragmentation.

Huh? It's so incredibly common in the real-world for a structure to be of a single type that pretty much every language makes that first-class supported via templates.

And since they are single-size there's no risk of fragmentation at all. Indeed object pools are typically how you avoid fragmentation, they are not things that cause it. So I'm not sure what you're trying to get at with that comment.

> The rules allow programs that use manual memory management to use pretty much any pool allocator library that they can get their hand on, but forbid garbage-collected languages from adjusting their GC parameters (last I checked, you could easily improve the performance of OCaml and Dart for that benchmark by a factor of 2-3 simply by adjusting GC settings, especially the size of the nursery).

Last I checked if I write a Dart library I can't adjust the GC settings in my library to make my library run better, either, but if I ship a C++ library I can absolutely use a pool allocator to make my library run better (and indeed this is exactly what real libraries do)

So I'd call this a good rule, and allowing tweaking of GC parameters would be more about trying to over-tune the runtime to the benchmark rather than the benchmark illustrating what is feasible to achieve in the language as a library or as part of a larger system.


> Do you have any supporting evidence for this?

For starters, by virtue of the construction of the benchmark alone, which mostly is about stress-testing allocation.

There are also quite a few papers that benchmark allocators, such as [1], where you can see that such allocation rates aren't exactly ordinary.

> Real-world programs can be pretty allocation heavy,

I said "atypical", not "impossible". Obviously, I can construct real-world programs with pretty much arbitrary allocation rates, but especially performance-sensitive code will avoid that, if only because it hurts memory locality.

> and if anything high allocation rates would "unfairly" benefit GC languages since native programs are banned from using arena allocators.

My point is that this apples vs. oranges, no matter how you cut it. I don't want one side to "win", I want results that are good science.

> Huh? It's so incredibly common in the real-world for a structure to be of a single type that pretty much every language makes that first-class supported via templates.

That sentence doesn't make sense, unless you live in a world where C++ is the only programming language (because templates are pretty much C++-specific).

> And since they are single-size there's no risk of fragmentation at all.

Simple example: A program alternates between allocating objects of size N (and then freeing most, but not all of them) and allocating objects of size M (and then freeing most, but not all of them). Worst case means that one object per block is enough to keep an entire block alive.

The concern here is long-running programs: if you have high allocation rates, you also have frequent deallocations (or the program eventually stops allocating).

> Last I checked if I write a Dart library I can't adjust the GC settings in my library to make my library run better, either, but if I ship a C++ library I can absolutely use a pool allocator to make my library run better (and indeed this is exactly what real libraries do)

And instead you could tune the final application; allocation tuning is often a non-local concern. Apples and oranges again.

Importantly, you seem to be mistaking this for a tribal argument (the GC tribe vs. the manual memory management tribe), whereas my point is simply that the benchmark is bad and tries to compare incomparable things.

[1] http://dl.acm.org/citation.cfm?id=3030211


> … my point is simply that the benchmark … tries to compare incomparable things.

That trump card wasn't actually mentioned in your original comment.

The difficulty is that programming languages (and programming language implementations) are more different than apples and oranges, but the question is still asked - "Will my program be faster if I write it in language X?" - and there's still a wish for a simpler answer than - It depends how you write it!

http://benchmarksgame.alioth.debian.org/dont-jump-to-conclus...


> Obviously, I can construct real-world programs with pretty much arbitrary allocation rates, but especially performance-sensitive code will avoid that, if only because it hurts memory locality.

That's not actually true at all. Games, for example, can have high allocation rates. But they use custom allocators to make those allocations extremely cheap using things like arena allocators for the allocations for a frame.

> My point is that this apples vs. oranges, no matter how you cut it.

It's not apples vs. oranges at all, though. It's "how can you solve this problem in the optimal way on any given language"

The set of rules does not appear to unfairly punish any particular language design. You can do object pools in GC'd languages, too, for example.

Does the problem itself have bias? Probably, but real problems in the real world have inherent language biases, too. That's a problem with reality, not a problem with the benchmark.

> That sentence doesn't make sense, unless you live in a world where C++ is the only programming language (because templates are pretty much C++-specific).

generics? Not C++ specific.

> Simple example: A program alternates between allocating objects of size N (and then freeing most, but not all of them) and allocating objects of size M (and then freeing most, but not all of them). Worst case means that one object per block is enough to keep an entire block alive.

That example results in terrible fragmentation for everyone if M > N other than a compacting GC. It's not made particularly worse by an object pool.

> Importantly, you seem to be mistaking this for a tribal argument (the GC tribe vs. the manual memory management tribe), whereas my point is simply that the benchmark is bad and tries to compare incomparable things.

I'm not mistaking it for that at all. I'm saying your arguments for why it's bad are bad. You seem to be upset that benchmarks for problems exist that do not represent your priorities of what should be benchmarked.


> The set of rules does not appear to unfairly punish any particular language design. You can do object pools in GC'd languages, too, for example.

Actually the rules for the Computer Language Benchmarks Game say about Binary Tree: 'As a practical matter, the myriad ways to custom allocate memory will not be accepted. Please don't implement your own custom "arena" or "memory pool" or "free list" - they will not be accepted.'


> It's not apples vs. oranges at all, though. It's "how can you solve this problem in the optimal way on any given language"

And for any synthetic microbenchmark, that's unlikely to reflect real-world workloads, as they tend to narrowly test just one aspect of the language (or rather, its implementation).

Remember, we're talking about a peer-reviewed paper here, where the burden to show relevance is upon the authors. Section 4 ("Threats to Validity") of the paper does not really address that adequately.

As the Computer Language Benchmark Game site itself quotes (and one reason why it calls itself a "game" and disavows usefulness for actual programming language comparisons):

  Attempts at running programs that are much simpler than a real
  application have led to performance pitfalls. Examples include:

  ...

  * Toy programs, which are 100-line programs from beginning
    programming assignments ...
  * Synthetic benchmarks, which are small, fake programs invented
    to try to match the profile and behavior of real applications
    ...

  All three are discredited today, usually because the compiler
  writer and architect can conspire to make the computer appear
  faster on these stand-in programs than on real applications.
You can construct pretty much arbitrary rules that will arbitrarily favor certain implementations over others. For example, the rules could also say to only use the language's built-in allocation mechanism for this benchmark. Or you could construct a benchmark that would heavily favor languages with built-in JIT techniques.

> You can do object pools in GC'd languages, too, for example.

No. The rules do not allow for that. I did mention how arbitrary they are, right? The regex-redux benchmark, for example, comes down to what the best external library is that you can link to under the rules. The gcc version wins in large part (being massively better than the g++ version, which relies on Boost regexes) because it uses the JIT version of libpcre. It's borderline absurd.

> generics? Not C++ specific.

You said templates, not generics. Generics alone are not necessarily sufficient to implement pool allocators. Plus, C++ is the only language that has major adoption that has both some form of generics and manual memory management.

> That example results in terrible fragmentation for everyone if M > N other than a compacting GC. It's not made particularly worse by an object pool.

This is false. Even a simple first-fit allocator can fare better, for example. Obviously, a compacting GC can avoid external fragmentation (almost) entirely, no matter the cause.


> … and one reason why it calls itself a "game"…

No: http://benchmarksgame.alioth.debian.org/sometimes-people-jus...

> … and disavows usefulness for actual programming language comparisons…

Not a definitive conclusion but a starting point.


> The rules allow programs that use manual memory management to…

And still there's outrage that programs with allocators custom-written for binary-trees are rejected :-)




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: