Because a couple decades ago Java convinced Enterprise Land that they can't hire millions of C++ jockeys and expect them to work effectively in huge projects that plan to evolve into the next decades' (aka: the present's) legacy mudball. Instead, they decided it would be easier to hire millions of Java jockeys and have them build enormous kiln-fired mudballs using the same architectural strategy as the Egyptian pyramids. They convinced academia to raise an entire generation of Java jockeys, hired them all right out of school, and set them immediately to piling up enormous mud bricks forever.
So, now they have a few million Java jockeys churning away and a few million person-decades of work put into their mud piles. When starting any new project, there isn't much question about how to build it: More Mud!
As an embedded developer where every cycle counts I have come up with the same question as the poster above why bother with such languages. If a switch processes packets at line rate with the use of ASIC's why not have some similar development in the world of big data.
You assume that JVM is slow, yes? That's not always the case. Interestingly, there's cases where JVM applications run just as fast as if not faster than native code. This blows my mind, as a C++ programmer myself.
Once compiled to native code, which it will be for big data because the same classes are reused over and over, I would assume it would be in same ball-park as C/C++ code.
There's still a pretty big speed penalty for Java because the object model encourages a lot of pointer-chasing, which will blow your data locality. In C++, it's common for contained structs to be flat in memory, so accessing a data member in them is just an offset from a base address. In Java, all Object types are really pointers, which you need to dereference to get the contained object. HotSpot can't really optimize this beyond putting really frequently used objects in registers.
A lot of big-data work involves pulling out struct fields from a deeply nested composite record, and then performing some manipulation on them.
Memory indirection is the biggest issue indeed. However, I'd also add that java has a terrible performance model, as a language. Unless you stick to primitives only, the abstraction costs start to add up (beyond pointer chasing). It shoves the entire optimization burden onto the JVM which by the time it runs has lost a bunch of semantic and type information in some cases. There are also codegen deficiencies in current hotspot C2 compiler (i.e. generated code subpar compared to roughly equivalent gcc).
I think this trend may stop soon. There are already OSS big data projects written in more performant languages (e.g. c++) coming around (e.g. scylladb, cloudera's kudu).
Rust is exciting, no doubt, and I have high hopes for its adoption, but I've personally not seen/heard of any visible OSS big data style projects using it. I see Frank McSherry's stuff has been mentioned, but I think that's still his pet project (hopefully not putting words in his mouth).
But really I was using C++ as an example of something more fit for these types of projects than Java, it doesn't have to be only C++ of course.
Rust has Frank McSherry (formerly working on Naiad for Microsoft Research) and his work on timely dataflow and differential dataflow: https://github.com/frankmcsherry/blog
Most JVM-based query engines uses bytecode generation and once JIT compiler decides that the code block is hot enough and can generate native code for generated bytecode, the output is identical to C and C++.
The author actually indicates that every CPU cycle is important for code block that will be executed for each row. So once you optimize hot code blocks, you're good to go.
Data access patterns are much more important than hot code optimization. Sadly Java offers few options on this front(until maybe Java 9 when values types might become a thing).
Modern CPUs have DRAM fetch time in the 100's of cycles. Any cache friendly algorithm is going to walk circles around something that plays pointer pinball instead.
This is why bytecode generation is used by query engines. They don't meant to be used for creating ArrayList or HashMap. Generally, they work with buffers instead of objects to avoid the issues you mentioned and garbage collection pressure.
Let's say we want to compile a predicate expression "bigintColumn > 4 and varcharColumn = 'str'". A generic interpreter would suffer from the addressed issues but if you generate bytecode for Java source "return longPrimitive > 5 && readAndCompare(buffer, 3, "str".getBytes(UTF8))" then you won't create even a single Java object the output is usually identical to C and C++.
Wouldn't you still pay the bounds checking penalty on those buffers though? Also anything that uses floating point will probably be trashed by int->float conversion(unless jvm bytecode has a load to float from addr, although I freely admit that I know less about bytecode than plain Java).
Either way the average Java dev isn't going to be writing bytecode so I feel like C/C++ still has the advantage in performance cases.
If you use ByteBuffer, then yes, the application may suffer from unnecessary checks. However the performance of ByteBuffer is not usually good enough anyway, that's why people use off-heap buffers (sun.misc.Unsafe) which is a native call that allocates memory in off-heap.
Also bytecode has instruction sets for all primitive types, otherwise there wouldn't be any point to have these primitive types in Java language since it will also be converted to bytecode instructions.
There are solutions for all the addressed issues but they need to much work to implement in Java compared to C++. However, once you solve this specific problem (I admit that it's not a small one), there are lots of benefits of using Java compared to C++.