Do they use those cores? Sure. But when I’m playing a game (even a game that feels like it should be ripping threads like Factorio) I usually see that one core is pegged at 100% while the others are chilling around 40-60% (I have an overclocked 8700K). In my experience, single core performance is still the bottleneck (as far as CPUs are concerned) for gaming.
the underlying simulation in factorio is pretty difficult to parallelize, since there is so much interdependent state. some systems that don't interact directly (eg, trains and belts) can be parallelized almost trivially, but subdividing these update jobs further is very tricky.
Yep, games like factorio and dwarf fortress are generally more memory bandwidth limited than direct CPU limited, and it can be very hard to split up those behaviors.
It is exceptionally unlikely that a game is both limited by memory bandwidth and not able to be parallelized further.
You might mean memory latency bound from serial simulations that need to calculate one piece to move on to the next, like an emulator or scripting language.
Factorio is an indie game, I guess it might not be easy for indie developers to make use of multithreading, whereas AAA game developers have entire teams dedicated to engine development and multithreading.
Anyway, traditional graphics APIs like OpenGL are practically single-threaded. Modern APIs like DX12 and Vulkan have been designed with multithreading in mind and support scaling with number of cores much better (with added overhead of having to do manual synchronization).
> AAA game developers have entire teams dedicated to engine development and multithreading
Unity has been introducing a lot of features (Job System, ECS) that make excellent use of parallelism. Additionally, a lot of the engine internals are being rewritten with that as a base (and some of the old features get patched with APIs that allow access from multithreaded jobs). It's a lot of fun when the code you write by default runs on all of the cores, with (almost) none of the usual parallel programming headaches.
Pretty soon you should start seeing all kinds of indie titles making use of those features.
An 8700k has 12 threads. If the game was only capable of using 4 threads effectively, I would expect to see something like 1 thread at 100%, and the other 11 threads at 300/11 = 27%.
If all your threads are sitting at 40-60% you're using more than 4 threads worth of execution... (if all 12 threads were at 50% that would be 6 threads worth).
Usually operating system rotates threads between cores, so while you might see 60% on every core on average, on a given snapshop there will be 1-2 cores active at 100% while rest idle. Remember that CPU can't really work at 60%, core either works at 100% or sleeps.
The PS4 is described as having 8gb of main memory. Unlike computers and older consoles, there isn't even a distinction between what is "System RAM" and what is "Video RAM". Can you elaborate how that is NUMA?
That's a very simplistic view of the architecture. Consoles in general have multiple CPU complexes and caches that do not straddle the boundary, and also multiple memory buses.
Are you referring to the multiple CCXs as different "NUMA" nodes? Because they aren't, they still share the same memory controller. There isn't a shared cache, true, and that does have performance implications but that doesn't make it NUMA.
If you can point to anything to support your NUMA claim that'd be highly interesting, but all the block charts & teardowns I can find show it's clearly UMA.
Not utilizing all cores fully means that the CPU is more powerful than the game needs, not that the game is unable to exploit many cores.
For example, at 3 GHz and 30 FPS, there are 100 million cycles per frame. If the game requires 250 million cycles per frame spread between a few threads, it will never need more than two cores and a half in aggregate, plus waiting; and with 10 cores used 10% there's more time to wait without dropping frames than with 3 cores over 80% load.