> in 30 years I have not once heard of a case where this theoretical benefit has...

ptx · on July 5, 2020

Isn't this because Java is designed for JIT compilation, or at least not designed (with the appropriate tweaking knobs) for AOT compilation?

Languages built with AOT compilation in mind (e.g. Rust or Nim) usually give you lots of ways make choices at compile time and give hints to the AOT compiler that the JIT compiler would instead try to infer at runtime in Java.

But by infering these things at runtime intead, maybe the JIT approach makes it easier to get fast code in those cases where you (as an application developer) don't want to put a lot of effort into optimization?

pjmlp · on July 5, 2020

Java has had commercial implementations of AOT compilers since the early 2000.

Most compilers for embedded systems have always offered that option, and in what concerns enterprise JVMs, JIT compilers have had the capability to cache JIT code and PGO data between runs.

Both options that have come now to OpenJDK, OpenJ9 and Graal.

Android also learned the hard way that changing to pure AOT did not achieve the performance improvements that they expected, while compilation on device achieved C++ compile times when it was time to update all apps, hence the multi-tier interpreter/JIT/AOT with PGO introduced in Android 7.

The main problem of AOT compilation with PGO, is that first of all one needs a good dataset so that the optimizations are in line with the actual behaviour in production, still doesn't work across dynamic libraries so optimizations like devirtualization are not possible, and most of the time the tooling is quite cumbersome to use.

seanmcdirmid · on July 5, 2020

Pretty sure assymetrix was done in the late 90s. 1998 or so, they started out doing educational software and pivoted to doing ahead of time compilation for some weird reason. Here is a link to when it failed in 1999: https://www.cbronline.com/news/sepercede_exits_java_sells_to...

chrisseaton · on July 5, 2020

Yes... but then we're not comparing JIT/AOT anymore - we're comparing different language designs.

dialamac · on July 5, 2020

but from a higher level.. you could just reimplement the thing in C++ with gcc and the whole thing will probably perform better.

Basically what I’m saying is that I’ve never seen any substantial rewrite of decent C++ code into Java perform better, even if jitting has benefits on a small scale, it’s not substantial enough to overcome other overheads in managed languages.

saagarjha · on July 5, 2020

C++ and Java have other language-level differences, though, that go beyond just "JITs are slower".

littlestymaar · on July 5, 2020

True, but part of that difference comes from the “we don't need to design the language for perf, the JIT will close the gap automagically” mindset though.

saagarjha · on July 5, 2020

I don't really think so for Java; most of it was designed prior to high-performance JIT-based VMs were a thing. In fact I think a lot of the advancements in JITs actually came out of work that went into making HotSpot fast.

dialamac · on July 5, 2020

Sure but one of vocal pitches/hype for Java in the beginning and even smalltalk before that.. was don’t worry about the price paid for .. automatic memory management, bytecode, dynamism... the top men are working on JIT, GC research and other optimizations that would not only close, but handily overtake the gap compared to the state of the art static compiled languages of the time. (Hence exactly why Hotspot was a thing after java was clearly becoming widespread)

In reality it was a bit of a mixed bag.. and to some of us that remember the hype from 30 years ago it comes across as over promising and underdelivering.

That isn’t to say that the technology isn’t incredible, I don’t mean to dump on it. But overpromising is sort of the status quo for tech.

dahfizz · on July 5, 2020

>> Interestingly in 30 years I have not once heard of a case where this theoretical benefit has manifested as a clear advantage in any real world application when looking at the system as a whole.

> The just-in-time code is (ignoring startup and warmup time) in my experience always faster, due to the extra runtime information.

I don't think that result is surprising. The issue is that in the real world you can't ignore startup and warm up times.

I've never heard the claim that JIT compiled code is slower than statically compiles code. The issue is that the extra costs associated with JIT don't outweigh its benefits.

pjmlp · on July 6, 2020

Sure they do, otherwise I wouldn't be replacing C++ based applications with Java and .NET ones since around 2006.

Because JIT or AOT is just one little piece of the overall puzzle.

istjohn · on July 5, 2020

There are plenty of real world situations where startup and warm up times don't matter.

darksaints · on July 5, 2020

Have you tried graal's PGO?

chrisseaton · on July 5, 2020

No unfortunately - it's a closed-source Enterprise feature I believe. But the current differential is pretty large and nobody is shouting loud that they can fix it using the PGO that I've heard. And what will the PGO determine that a JIT can't also do?

darksaints · on July 5, 2020

A number of things actually. For one, most JIT implementations only optimize once, instead of continuously. The result is machine code that is optimized for the sorts of things done at startup, as opposed to steady state operation. For example, I have a Play app that takes a minute to start up. The JIT does a great job of optimizing the code that is called during the setup process, but the API code itself doesn't get much optimization.

With PGO, I can get a more representative profiling dataset, allowing the JIT to see actual production loads instead of startup loads.

And for CLI apps, PGO code starts up fast and never slows down to profile or optimize.

chrisseaton · on July 5, 2020

> For one, most JIT implementations only optimize once, instead of continuously. The result is machine code that is optimized for the sorts of things done at startup

Funny you should mention that - the author of this blog post has another post on fixing that problem for one specific (but very practical) case where we want to disregard some profiling information from the startup phase because it pollutes the genuine profiling information.

https://engineering.shopify.com/blogs/engineering/optimizing...

saagarjha · on July 5, 2020

Most JIT compilers will go after any code that shows up as hot, regardless of when it executes. If your API code is that, even a minute after startup, it should really be getting optimized…

darksaints · on July 5, 2020

Think of an use case where you have an API whose mode behavior is to do nothing, waiting for a call... but commonly gets calls that are very compute intense for short bursts of time.

With the most common type of JITs, which profile once and compile once, I'm going to get code that is optimized for startup and initialization.

If I have an "advanced" JIT, which is constantly deoptimizing and reoptimizing for whatever it sees as the hottest path for some arbitrarily chosen snapshot of time, I'm going to see my compute-intensive code slowed down so that it can optimize and compile it every time that endpoint is called, but then subsequently deoptimized while it is sitting around waiting for something to do, ensuring that I have to go through the same process the next time it is called. You can actually see a lot of situations where this regime could be even worse than a naive startup-based single optimization, which is why it is actually not that common outside of dynamically typed languages.

With PGO, I can select a profile snapshot during a stress test, and get heavily optimized code specifically for the things that actually stress the server. And it will stay optimized for that use case.

tsimionescu · on July 6, 2020

The most commonly used JIT in the world is almost certainly the one in the OpenJDK, which is of the advanced kind. And it does not suffer from the problem you are talking about, because it only ever looks at code that is getting executed.

Basically, it will do something like: interpret a function for a the first few thousand times it is executed, collecting execution metrics. After a threshold is reached, next time that function is called, start compiling it, optimizing based on the execution metrics collected earlier. Leave behind a few hooks for future changes. Keep collecting metrics from function execution. If the profile of execution changes significantly, re-compile the function according to the new profile (possibly doing things like un-inlining).

This is perfectly aligned with a startup vs normal production use workflow. The only problems can appear if you have a continually changing pattern of execution through the same function.

darksaints · on July 6, 2020

HotSpot certainly does better than the vast majority of JITs out there, but it is definitely not perfect. If you have a JIT cache that is too small (really easy to do if you're not aware of the JIT settings or what they do), or your code does most of its heavy lifting in functions that are not called very often (because call frequency is the only heuristic it uses to prioritize optimization), you can easily trap yourself into horribly optimized executables that the JIT doesn't know its way out of.

saagarjha · on July 6, 2020

> The most commonly used JIT in the world is almost certainly the one in the OpenJDK

I don’t know, there are a lot of people who browser the web using Chrome…

tsimionescu · on July 6, 2020

Oops, you're right, JS is certainly more commonly used, I forgot about that.

pjmlp · on July 6, 2020

And since Java 11, OpenJDK has inherited the PGO and JIT caches capabilities of J/Rockit, but people still use Java 8, so they have any idea of what modern Java JITs are actually capable of.

OpenJ9 and ART also have similar capabilities.