Unless the programmer is programming at a very low level, the listed events occu...

dragontamer · on Sept 5, 2018

> Unless the programmer is programming at a very low level, the listed events occurring are out of his control

> CPU caching is mostly transparent in the ISA

Nonsense. Even in a relatively high language like Java, you can use primitive types like int[] to ensure that certain elements are close to each other in memory. As such, you can have good memory access patterns even in a high language like Java or C#.

I'm fairly certain this stuff is important when choosing data-structures: Vector vs Linked List for instance. Linked Lists are harder to cache than Vectors, and this chart helps explain why both of these O(1) traversals can have dramatically different performance characteristics.

> disk seeking is scheduled by the kernel or in storage controllers, file buffers are cached in the kernel

But you can read a file from beginning to end. Even in a very, very high language like SQL, you can often ensure a high-speed sequential table scan if you write your joins properly. And knowledge of sequential scans can assist you in knowing which indexes to setup for your tables.

Knowing that you have SSDs vs Disks can be helpful in the architecture of SQL architectures.

bluetomcat · on Sept 5, 2018

> As such, you can have good memory access patterns even in a high language like Java or C#.

CPU and data intensive heavy lifting is rarely done in such programs, it is delegated either to specialised libraries or some middleware in the form of a RDBMS. Most of these programs spend most of their time waiting for some IO event, so the few microseconds gained from the vector with a few hundred elements are negligible.

> But you can read a file from beginning to end.

That's what most programs actually do most of the time because files are essentially a stream abstraction. Programs that jump around a file would map it into memory, then the CPU and the kernel would do their best to cache the hot regions, even if the access to these regions is temporally or spatially distant.

OskarS · on Sept 5, 2018

> CPU and data intensive heavy lifting is rarely done in such programs

That's absurd. They aren't as high-performance as C or C++, but Java and C# both have screamingly fast JIT compilers and plenty of high-performance code is written in them. We're not talking about Prolog here. And memory access patterns absolutely makes a huge difference in performance in these languages.

Sure, you CAN ignore that kind of stuff if you want to, but good programmers don't.

phakding · on Sept 5, 2018

> CPU and data intensive heavy lifting is rarely done in such programs

How could you make such a blanket statement? is bashing java the new hipster thing to do in programming world nowadays?

dragontamer · on Sept 5, 2018

Other posters have discussed the first part of your post.

But the second...

> That's what most programs actually do most of the time because files are essentially a stream abstraction. Programs that jump around a file would map it into memory, then the CPU and the kernel would do their best to cache the hot regions, even if the access to these regions is temporally or spatially distant.

Just an FYI: you should always use mmap (Linux / POSIX), or File-based Section Objects (Windows). I don't think streams have any performance benefit aside from backwards compatibility, and maybe clarity in some cases.

MMap and the Windows equivalent allows the kernel to share the virtual memory associated with that file across processes. So if the user opens a 2nd, or 3rd version of your program, more of it will be stored "hot" in the RAM.

Since mmap and section objects only use "virtual memory" (of which we have 48-bits worth on Intel / AMD x64 platforms), we are at virtually no risk of running out of ~256TB of virtual RAM available to our platforms.

guipsp · on Sept 5, 2018

Quite a bit of HFT software is written in java.

Iem3ohvi · on Sept 5, 2018

> Unless the programmer is programming at a very low level

If you write unixy tools that do just one job then you often have to deal with this. For example rsync and tar put files into sequential mode and perform readaheads or writebehind-drop the page cache.

And it's not just that kind of tool. At $JOB I did a fairly simple optimization to significantly reduce loadtimes (from NFS) in a render farm by importing a 3rd-party library which provided the necessary libc bindings for readaheads. It's only a dozen lines of code but reduces user-perceived latency from minutes to seconds. The PO was quite happy about not having to pay for hundreds of NVMe SSDs.

sebazzz · on Sept 5, 2018

In general small scale Web development the only thing that matters are: 1. Are there not too many queries fired at the database (N+1) 2. Do the queries perform well

Once you use SQL or any external service to your application what you do else often does not matter very much. As long as you keep the big O in the back of your mind, you would not optimize to cache lines etc.

phakding · on Sept 5, 2018

Not sure what very low level is, but if you are writing anything with low latency requirements you need to be cognizant of these things. As someone mentioned, using an array over a list helps CPU in memory striding. Cache misses are expensive.

beached_whale · on Sept 5, 2018

You often can control your data access patterns. Array like access, in order, is often cache friendliest

TruffleLabs · on Sept 5, 2018

The visual could help programmers think about the kinds of latencies that have to work through, maybe helping highlight them in an infographic for their teams/partners.