Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Unless the programmer is programming at a very low level, the listed events occurring are out of his control

> CPU caching is mostly transparent in the ISA

Nonsense. Even in a relatively high language like Java, you can use primitive types like int[] to ensure that certain elements are close to each other in memory. As such, you can have good memory access patterns even in a high language like Java or C#.

I'm fairly certain this stuff is important when choosing data-structures: Vector vs Linked List for instance. Linked Lists are harder to cache than Vectors, and this chart helps explain why both of these O(1) traversals can have dramatically different performance characteristics.

> disk seeking is scheduled by the kernel or in storage controllers, file buffers are cached in the kernel

But you can read a file from beginning to end. Even in a very, very high language like SQL, you can often ensure a high-speed sequential table scan if you write your joins properly. And knowledge of sequential scans can assist you in knowing which indexes to setup for your tables.

Knowing that you have SSDs vs Disks can be helpful in the architecture of SQL architectures.



> As such, you can have good memory access patterns even in a high language like Java or C#.

CPU and data intensive heavy lifting is rarely done in such programs, it is delegated either to specialised libraries or some middleware in the form of a RDBMS. Most of these programs spend most of their time waiting for some IO event, so the few microseconds gained from the vector with a few hundred elements are negligible.

> But you can read a file from beginning to end.

That's what most programs actually do most of the time because files are essentially a stream abstraction. Programs that jump around a file would map it into memory, then the CPU and the kernel would do their best to cache the hot regions, even if the access to these regions is temporally or spatially distant.


> CPU and data intensive heavy lifting is rarely done in such programs

That's absurd. They aren't as high-performance as C or C++, but Java and C# both have screamingly fast JIT compilers and plenty of high-performance code is written in them. We're not talking about Prolog here. And memory access patterns absolutely makes a huge difference in performance in these languages.

Sure, you CAN ignore that kind of stuff if you want to, but good programmers don't.


> CPU and data intensive heavy lifting is rarely done in such programs

How could you make such a blanket statement? is bashing java the new hipster thing to do in programming world nowadays?


Other posters have discussed the first part of your post.

But the second...

> That's what most programs actually do most of the time because files are essentially a stream abstraction. Programs that jump around a file would map it into memory, then the CPU and the kernel would do their best to cache the hot regions, even if the access to these regions is temporally or spatially distant.

Just an FYI: you should always use mmap (Linux / POSIX), or File-based Section Objects (Windows). I don't think streams have any performance benefit aside from backwards compatibility, and maybe clarity in some cases.

MMap and the Windows equivalent allows the kernel to share the virtual memory associated with that file across processes. So if the user opens a 2nd, or 3rd version of your program, more of it will be stored "hot" in the RAM.

Since mmap and section objects only use "virtual memory" (of which we have 48-bits worth on Intel / AMD x64 platforms), we are at virtually no risk of running out of ~256TB of virtual RAM available to our platforms.


Quite a bit of HFT software is written in java.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: