I posted this a day or two ago: The A100 whitepaper "spoiled" a lot of these fac...

zamadatix · on Sept 19, 2020

GPU accelerated direct storage access was previously part of Telsa/DGX-2 as a feature named "GPUDirect storage" https://developer.nvidia.com/blog/gpudirect-storage/.

It's a feature the new consoles are doing so it'll be widely supported.

genocidicbunny · on Sept 19, 2020

RTX I/O is going to be a big feature, and games are likely some of the first consumer-facing software that will use it because it is a standard features for the next console generation. And AAA devs already support multiple performance profiles, feature support fallbacks..etc. There's no reason they couldn't have the engine take advantage of RTX I/O when it exists, but otherwise fall back on an emulation layer of sorts.

In addition, I suspect the slice of the video game market that has a GPU with RTX I/O capability will also have a NVME SSD. Now, this is niche, but with that slice of the market also being the top-end performance tier, they're still going to be catered to by AAA devs.

dr_zoidberg · on Sept 19, 2020

Even without an NVMe drive you're better off with this just by skypping system RAM altogether. Bu you're not going to be able to use it to stream back and forth game content at the snap of a finger (well maybe that's a bit hyperbolic) as the console makers have been saying they will.

jagger27 · on Sept 19, 2020

> The FP32 doubling, is one of the most important bits here. But fortunately for programmers, this doesn't really change how you do your code.

Early benchmarks are showing games under-performing quite a bit in the worst cases. The crux of the issue is that it's not /exactly/ a no-compromise doubling of FP32. Each data path per SM can either do 2xFP32 or 1xINT32/1xFP32 per clock cycle. So if your game or application has any significant INT32 operations scheduled, all of a sudden you're back to the number of FP32 cores you had last generation, though you get the benefit of parallel INT32 execution.

It's a pretty cool architecture overall though.

kllrnohj · on Sept 19, 2020

> though you get the benefit of parallel INT32 execution

Parallel INT32 was added with the last generation, in Turing. See page 13 of https://www.nvidia.com/content/dam/en-zz/Solutions/design-vi...

So nvidia split out the INT32 from FP32 last gen to make them independent, then re-added FP32 to the INT32 but kept it as 2 datapaths.

PaulKeeble · on Sept 19, 2020

Its not uncommon for GPU workloads in games to max out about 20% INT32 calculations, but alas its enough to drop the FP32 performance quite a bit. I suspect Nvidia next time will probably separate out the INT32 and 2x FP32 units and gradually move towards going towards a better ratio of hardware that better suits the usual workload split.

_kbh_ · on Sept 19, 2020

Due to the lower amount of INT32 in game loads as you stated, I don't think that separating INT32 and FP32 hardware makes a lot of sense, because you can share a substantial amount of the hardware between the two overall leading to space savings.

peterhj · on Sept 19, 2020

On the contrary, "dark silicon" instead suggests that separating fp32 and int32 (now in GA102/104, fp32 and int32/fp32) data paths at the cost of more die space usage currently makes excellent sense. (See also: tensor cores, ray tracing cores.) Jensen Huang very briefly alluded to this when during the GA102/104 announcement he mentioned the end of Dennard scaling.

_kbh_ · on Sept 22, 2020

But the GA102/GA104 doesn’t have seperate execution units for INT and FP32 because the INT also does FP32. So I don’t see how that shows that separating FP32 and INT hardware makes sense.

Guthur · on Sept 19, 2020

I've been thinking that's why we are seeing the true doubling in full RTX like quake and minecraft but not on more traditional rendering engines.

From my understanding int is often used for lookups, and I'd presume a lot of that is some sort of environment mapping which adds some contention as int is more limited and "steals" from the doubling of FP.

calo_star · on Sept 19, 2020

I think "parallel execution of fp32/int32" is kind of vaguely defined by them... Do they mean fp32/int32 instructions from the same thread (aka warp/wavefront) or from different threads? If it's the latter I'm pretty sure AMD GPUs have been doing it too.

AGSYS · on Sept 18, 2020

Everyone i know that plays games on a pc uses either sata m2 or nvme (most use the latter)

fomine3 · on Sept 23, 2020

Some PC builds still equip HDD for installing game meanwhile small SSD. IMO it's waste.

kg · on Sept 19, 2020

RTX I/O is a parallel to a key feature in both of the new consoles (XBox Series and PS5) so I suspect cross-platform titles will have support for it.