Your argument basically boils down to "If you write fast C++ it will be fast", w...

v8dev123 · on April 14, 2021

You don't have to be an expert to write Fast C++ that beats Java. I wrote a simple for loop and when compiled with -O3 it beat the Java version of it.

You just need to know your tools, that's it. Plenty people forget -O flag and existence of libraries like Folly.

If you combine PGO along with these must knows, you seriously will become much more faster.

kaba0 · on April 14, 2021

A for loop is not too interesting application -- it is not what Java optimizes for, and chances are you didn't benchmark it correctly.

To optimize your program in a low level language you have to basically have a whole plan for the architecture of your program beforehand, and every major change to that will break your optimizations. Also, don't forget about non-standard object life cycles, which is really common. Complex C++ programs basically employ their own GCs, which will be inferior to any one included in the JVM.

Of course low-level programs have their place (plenty of), eg. audio processing, embedded, million other, but the average business/CRUD app will be faster* both to execute and to produce in Java, as well as better maintainable.

* With enough time a competent team could of course write a faster version of it in C++, but it's not a good use of their time, and you would be surprised how hard it, especially with ever-changing requirements.

v8dev123 · on April 14, 2021

C++ is not just a low level language. It's consists both high level and low level. C however a low level language.

I bench-marked using Intel vTune.

for loop is interesting. It's why Tensorflow Core written in C++ instead Java.

I don't know any complex C++ program that employ their own GCs when C++ has RAII which is superior to GC.

Just give a try for C++11/14/17 and you will see which one is more maintainable and expressive.

Look at Chromium codebase. It's the most beautiful codebase I've ever been to.

I've done a lot of CRUD web apps in C++ using expresscpp [1] and I would say it's easy as ABC.

A lot of Java folks haven't tried C++11/14/17 (Modern C++).

C++ is Zen of OOP.

[1] https://github.com/expresscpp/expresscpp

kaba0 · on April 14, 2021

> C++ is not just a low level language

A language either cares about low level details or not. You can’t have it both ways. And c++ is absolutely a low level language.

> I don't know any complex C++ program that employ their own GCs when C++ has RAII which is superior to GC.

RAII is not at all a replacement for GC. It is only suitable for a subset of object lifetimes. There are plenty of cases where you can’t really pinpoint a scope-exit where this given object should be reclaimed.

A GC is a necessity in many concurrent algorithms that simply could not be written without.

> Just give a try for C++11/14/17

I have and I like it. There are domains where I would not even start writing Java, and vice versa with C++.

Your CRUD app may have been a breeze but what if the requirement has changed now touching on a core of your program. You have to refactor and it will be really expensive, compared to a high level language. Every memory allocation/deallocation have to be thought out again and tested (and while rust can warn about it, you still have to write a major refactor as it is another low level lang)

v8dev123 · on April 14, 2021

> A language either cares about low level details or not. You can’t have it both ways. And c++ is absolutely a low level language.

Please tell me why you can't. C++ is both not one. It's a multi paradigm language.

In Modern C++, the low level details invisible.

> Every memory allocation/deallocation have to be thought out again and tested

True If you're writing C with Classes or Java Style C++.

>> C with Classes >>> malloc()

>> Java Style C++ >>> new and delete everywhere

> There are plenty of cases where you can’t really pinpoint a scope-exit where this given object should be reclaimed.

Show me. I'd bet your case can be solved with xvalues.

> A GC is a necessity in many concurrent algorithms that simply could not be written without.

Show me a concurrent algorithm that needs GC.

> but what if the requirement has changed now touching on a core of your program.

C++ is a OOP language just like Java. You do it same way as you do in Java. Use inheritance.

> major refactor as it is another low level lang

No. It's not a low level language if you write Modern C++.

The case for Java very clear prior 2011 but now C++ has caught up.

kaba0 · on April 15, 2021

> It's a multi paradigm language.

Being multi-paradigm is a different axis all around. Low-level (which is by the way not a well-defined concept, C is actually also high level, only assembly is low, but that usage is not that useful) means that low level details leak into your high level description of code, making the two coupled. You can’t make them invisible.

Also, as an example, think of Qt. A widget’s lifetime is absolutely not scope-based, nor is it living throughout the whole program. You have to explicitly destruct it somewhere. And there are plenty of other examples.

And as I said, I’m familiar with RAII, it’s really great when the given object is scope-based, but can’t do anything otherwise.

> C++ is a OOP language just like Java. You do it same way as you do in Java. Use inheritance.

And if the new subclass has some non-standard object life cycle you HAVE to handle that case somewhere else, modifying another aspect of the code. It is not invisible, unless you want leaking code/memory corruption.

v8dev123 · on April 15, 2021

> low level details leak into your high level description of code, making the two coupled. You can’t make them invisible.

It's your job to make it not to leak. You have to write Modern C++ wrappers around C libs.

Similarity, The same can be said for Java. You can do low level in Java.

C++ is not C. C++ has backward compatibility with C.

Look at Boost folks, they wrote a Modern C++ wrapper around a C HTTP parser.

> And as I said, I’m familiar with RAII, it’s really great when the given object is scope-based, but can’t do anything otherwise.

Nothing is impossible.

You can use Scope Exit Guard with QT Widget.

https://github.com/ricab/scope_guard

> And if the new subclass has some non-standard object life cycle you HAVE to handle that case somewhere else, modifying another aspect of the code. It is not invisible, unless you want leaking code/memory corruption.

Again, Scope Exit Guards solve your problem!

astrange · on April 14, 2021

The main problems with Java aren't being JITted, it's that it's not expressive enough. It doesn't have SIMD (yet) or value types (yet…?).

I would expect a JIT to not really be able to find a lot of magic optimization opportunities, though maybe there are some, and it'd actually be annoying if it could. The most important thing in a tool like that is predictability, because you can't make development decisions based on magic.

MaxBarraclough · on April 14, 2021

> it's that it's not expressive enough

That may be part of it, but I imagine the JVM's safety obligations are also a significant factor. If the JIT can't elide array bounds checks, checks must be performed at runtime. Runtime type checks might be needed. Runtime arithmetic checks might also be needed. The JVM is also more constraining regarding concurrency gone awry, than the C/C++ memory model. [0] More broadly, the JVM's lack of undefined behaviour constrains the optimiser in ways the C/C++ approach does not (although I'm open to the idea that it's overstated how much of a performance win is owed to C and C++ having many kinds of undefined behaviour).

And of course there's the GC and Java's high object-churn, even where lifetimes are known statically. To my knowledge, escape analysis (the relevant family of JIT optimisations) still hasn't really addressed this.

[0] https://softwareengineering.stackexchange.com/q/262428/

kaba0 · on April 14, 2021

The JIT can elide array bound checks really often, and most "low hanging" optimizations are solved quite cleverly (it's way out of scope for my knowledge, but I remember reading that null checks are elided by trapping segfaults? Does it make sense?). There is no over/underflow checks so I don't know what you mean by arithmetic checks -- in pure number crunching the JVM is insanely fast.

And you are right in that many Java libs/programs are quite happy to create garbage, though with generational GCs it is really cheap. Escape analysis is great, but primitive classes in Project Valhalla will solve this last problem of object locality.

MaxBarraclough · on April 14, 2021

> null checks are elided by trapping segfaults

Sounds right. No need to generate instructions to perform the check if you can rely on a hardware trap, by means of signal-handling cleverness.

> There is no over/underflow checks so I don't know what you mean by arithmetic checks -- in pure number crunching the JVM is insanely fast.

Integer multiplication, addition, and subtraction, are all defined in Java to have wrapping behaviour, and are easily implemented. Whatever the input values, there's no way those operations can fail. (Incidentally, this is a terrible way of handling overflow. This turned up recently in discussion. [0]) Division is trickier. In Java, integer division by zero results in an exception being thrown. Apparently JVMs can implement this with signal-handling cleverness similar to dereferencing null references. [1] Two's complement integer division has another edge case, which is undefined behaviour in C/C++ but which, iirc, results in an exception in Java: INT_MIN / -1. I believe the JIT has to emit instructions to check for this, as it's not possible to leverage signal-handling there.

I don't know how well modern Java performs in floating-point arithmetic. Here's an old tirade about it [2] and discussion. [3]

> with generational GCs it is really cheap.

At the risk of going off topic: doesn't Java tend to perform somewhere around 60% the speed of C/C++, while using considerably more memory? Perhaps the GC isn't to blame, but clearly the blame belongs somewhere. It's like the way advocates of Electron will insist that modern HTML rendering engines are fast and efficient, the DOM is fast and efficient, and JavaScript is fast and efficient... and yet here we are, with Electron-based applications reliably taking several times the computational resources of competing solutions using conventional toolkits.

> primitive classes in Project Valhalla will solve this last problem of object locality

Interesting, sounds like the kind of ambitious initiative that will require deep changes to the JVM.

[0] https://news.ycombinator.com/item?id=26666013

[1] https://www.javaer101.com/en/article/3117893.html

[2] (PDF) https://people.eecs.berkeley.edu/~wkahan/JAVAhurt.pdf

[3] https://news.ycombinator.com/item?id=6585828

kaba0 · on April 15, 2021

> At the risk of going off topic: doesn't Java tend to perform somewhere around 60% the speed of C/C++, while using considerably more memory?

It is hard to properly benchmark this generally, for small programs it is “at most” within 2-3X, but I believe for more complex applications it closes the gap quite well (many things can be “dynamically” inlined even between classes far from each other). Not sure how it fares with PGOs.

And yeah it does use more memory, both the runtime/JIT/GC and each object has considerable overhead, but I don’t think that comparing it to Electron is apt. Electron is slow because it adds additional steps to the picture, not because of the JS engine itself. V8 is similarly an engineering gem, and it can be stupidly fast from time to time.

As for the GC: The GC itself is required for some program to work correctly. C/C++ codebases often create their own GC, and that will surely be slower than any of the multiple GCs found in the JVM. But for short-living programs the GC doesn’t even run (similarly to how some short lived C program leaves clean up to the OS), so rather the former is responsible for the bigger memory usage.

All in all, where ultimate control over memory/execution is not required (that is, you don’t need a low level language), Java is fast enough, especially combined with it being productive and easy (and safe) to refactor, as well as having top notch profiling tools (with so low overhead, that it can be run in production as well).

kevingadd · on April 14, 2021

Optimizations like 'these two function arguments are always int31' in v8 or spidermonkey are 100% predictable at this point and result in all your type checks and boxing being eliminated, and with the known types it also becomes much cheaper/faster to create object instances (since now if you store those values into properties of an object, that object's shape is fully known). Various properties like this can extend out into larger parts of your JS application.

There's still a lot of magic you can't rely on, but you'd be surprised how much you CAN rely on. Asm.js was built on this observation: If you write your JS following some basic rules it's actually pretty easy to land on predictable, well-optimized paths. Of course, one of WASM's advantages is that by design you're almost always on those paths and don't have to worry.

kaba0 · on April 14, 2021

> The most important thing in a tool like that is predictability, because you can't make development decisions based on magic.

Fortunately you've got the best profiling tools available, so you don't have to guess. And also you get to see the relative importance of the function you try to optimize, whether that actually is the bottleneck (and actually people often guess wrongly where the bottleneck is)

pjmlp · on April 14, 2021

It surely has had support for AVX for several releases, although via the autovectorization support, and explicit SIMD has been made available as preview on Java 16.

astrange · on April 14, 2021

Autovectorization is the kind of magic you can't rely on. It sort of works on a single platform but you will always run into cases it doesn't handle even if you own your own team of autovectorization engineers who tell you it's perfect.

pjmlp · on April 14, 2021

A magic shared with C, Fortran and C++ compilers, among others, so support is there.

sudosysgen · on April 14, 2021

Compiled autovectorization is miles more reliable than JIT autovectorization.

kaba0 · on April 14, 2021

At the other hand, the explicit Vector API will use the correct "flavor" of SIMD instructions on the platform and will gracefully fall back to non-simd version if it is not supported. And as far as I know, the SIMD story is quite bad with C.

sudosysgen · on April 14, 2021

Yes it's quite bad with C. With C++ and Rust it's much much better when you do it properly.

pjmlp · on April 14, 2021

When you do it properly is the big question.

astrange · on April 15, 2021

It's pretty good in C with assembly, inline or not. SIMD usually involves a lot of aliasing violations and intrinsics have weird hard to read names, so I find assembly easier to deal with than C here.