Though it may be that expensive high-end VR could be the luxury home theater of tomorrow, and you put 4 GPUs in a box to get 90FPS 8K seamless VR. Faster PCIe could be nice to have in that case.
Multi-GPU is kind of a meme for anything besides HPC. Graphical applications currently don't support it very well, if at all. And it means you're basically duplicating the scene across multiple GPUs, which is a lot of money burnt on VRAM.
New PCIe specs will likely only affect data transfer like for storage applications, at least for the time being.
Multi-GPU doesn't exist for anything other than ML or HPC.
While Vulkan fully supports it. Metal and DX12 don't.
The real issue for real time AR/VR/144FPS+ is knowing what you can/can't offload, what's the transfer latency, etc. this will change based on cards, generations, CPU's, library versions, Driver versions, vendors.
It is a nightmare.
Even SLI/XFire when you know there are identical cards, and drivers. You still see ~10-20% pref gain for 50% more resources.
I've wondered if it would be possible to design a dual GPU setup to handle VR applications more efficiently? VR tends to be kind of hard on current GPUs due to needing to render the scene from two viewpoints. With a dual GPU setup you could effectively dedicate one GPU per eyeball.
That's actually not true. I believe with the latest generation of Nvidia cards (10 series) they made rendering multiple similar viewpoints dirt cheap by tweaking the hardware and the rendering pipeline.
From the arstechnica review on the 1060: "GPU Boost 3.0, Fast Sync, HDR, VR Works Audio, Ansel, and preemption make a return too , as well as the ability to render multiple viewpoints in a single render-pass."
From my limited understanding, VR is difficult because of the tolerances required. For regular gaming, slight frame drops were annoying, but didn't break the experience. Thus, it was reasonable to ship a game that was able to hit 60fps 99% of the time, and just write off the remaining 1% of the time. For VR, not only do we need to hit at least 75fps, the tolerance for frame drops is much much lower (a stutter while you're watching a monitor is annoying, the same stutter in VR could make you lose your balance). To aim to hit 75fps and guarantee that you'll hit that 99.9%, 99.99%, or 99.999% of the time is where the difficulty lies. I'm sure most of the HN audience has experience with just how difficult it is to tack on an additional 9.
Linus Tech Tips has a rant about how Intel is trying to segment the market by only offering more PCI-E lanes on the most-expensive CPUs despite mobo support with their new X299 platform (they've done this in the past as well): https://www.youtube.com/watch?v=TWFzWRoVNnE
Basically, if you want the full 44 lanes, you have to get the top-spec processor for $1k ($1k for the cpu itself). The i9s above the 7900x don't even have details yet, since Intel is working on adapting their Xeons to their new HEDT platform to keep up with AMD. Here's the breakdown: https://www.cinema5d.com/wp-content/uploads/2017/05/Intel-co... I've also heard speculation that they're crippling the lanes on cheaper chips because they're so worried about cannibalizing their server market. Don't expect ECC support on these either. Honestly, if any Xeon BIOSes and CPUs supported unlocked multipliers, they'd be a better deal, but I get that overclocking and max stability don't really mix. OR they could totally wow everyone and come out with some 5.0Ghz (all-core) 16-core chip that isn't overclockable any further but can run ECC. Sell THAT for $2k to workstation users and rich gamers. Make it 2P capable as well in case you need 32 cores. Maybe AMD will do it with ThreadRipper, which BTW has 64 PCI-E 3.0 lanes ( https://www.pcper.com/news/Processors/Computex-2017-AMD-Thre... ).
Video card for games maybe, but if you look at modern network interfaces, like this board (http://www.mellanox.com/related-docs/prod_silicon/PB_Connect...) the bottlenecks in gen3 are pretty apparent. They are also pretty apparent in storage as well, partially because no one wants to create a x16 .M2 form factor interface to keep up with xpoint. Each time a new flash interface is created, someone releases a product capable of saturating it within a short period of time. But beyond that, consider the bandwidth of current flash arrays.
But returning to video cards, in GPGPU configurations transferring data between the board and system memory can quickly become a bottleneck.
Bottom line, is that gen4 is about 3 years late. PCI specs (or more generally x86 IO interfaces) seem to consistently lag their requirements. Hence why we lived with AGP (or VLB for that matter)..
Last time I checked, even very high end cards are far from being bottlenecked by a 16x 3.0 bus.