Hacker News new | past | comments | ask | show | jobs | submit login

Databricks recently rebuilt Spark in C++ in a project called "Delta Engine" to overcome the JVM limitations you're pointing out. You're right, Rust is a great way to sidestep the dreaded JVM GC pauses.



Our experience with Delta Engine has been that it's way more resource hungry than the JVM code it replaced. It doesn't handle resource exhaustion well at all; lots of crashing and deadlock when nearing full resource utilization.

I would love to have something more resource efficient than Spark on JVM, but Delta Engine isn't there yet.


At the same time the JVM is getting better memory tracking analysis and incremental pauseless collectors (C4, ZGZ, Shenandoah, G1 improvements)

https://blogs.oracle.com/javamagazine/understanding-the-jdks...


These new GCs are amazing technology, but they primarily target pause time, whereas in data processing the primary concern is the “headroom” of extra space in your heap to allow the GC to work efficiently.


For those cases large off heap structures of arrays can make hundreds of GB of data invisible to the GC.

One can do both.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: