No, we don't. Latest GPU architectures including Vega (and Pascal obv) support rendering the scene once and then projecting it from two viewports thereby generating two scenes without having to render the entire scene twice.
Surely some of the work should be possible to re-use? I mean for most pixels beyond a certain depth the incident eye vector direction will be identical for all practical purposes, so if one could just fudge it and use the same calculated pixel color for both eyes and just offset it slightly then it should be usable without having to be calculated twice. No one would notice if the reflections or specular lobe for the right eye were calculated with the indicent camera of the left eye.
Once you have calculated the pixels for the left eye, those should be possible to re-use for the right eye, with some mapping. Certain pixels that are only visible to the right eye will have to be computed. I'm not sure if it's possible or if it even has a chance to be a performance gain (or indeed if this is actually how it already works). Doing the full job of 2x4k pixels for two eyes when they are a) almost identical for objects beyond a certain distance and b) quality is almost irrelevant for most pixels where the user isn't looking.
With foveal rendering and some shortcuts it should be possible to go faster with a 2x4k VR setup than for a regular 4k screen when you need to render every pixel perfectly because you don't know what's important/where the user is looking. Obviously one needs working eye tracking etc. first too...
I agree with you. There's no reason to run every pixel shader twice in full.
It seems logical that each surface/polygon could be rendered once, for the eye that can see the most of it (a left facing surface for the left eye, a right facing surface for the right eye), then squashed to fit the correct view for the other eye. Then, fill in all the blanks. Of course, the real algorithm would be more complicated than this, but it seems like at least some rendering could be saved this way.
Technically the lighting won't be right, but you don't have to use it for every polygon, and real-time 3D rendering is already all about making it 'good enough' to trick the human visual system, not to be mathematically accurate. If technically-accurate was what we insisted on, games would be 100x100 pixels at 15FPS as we'd insist on using photon mapping.
If we do eye tracking we can probably lower that to 1024x786 equivalent rendering, by using high resolution where the eye is looking and tapering off to just a blurry mess further away. You can even completely leave out the pixels at the optic nerve blind spot. The person with the headset won't be able to tell they aren't getting full 4k or even higher resolution. And we can run better effects, more anti-aliasing, maybe even raytracing in real-time.
If this is the nvidia/smi research you are referring too, well it seems nice but without details, specifically dynamic performance, and there is reason to be sceptical of how good it is.
The field of view of current consumer HMDs is too narrow for there to be a big saving compared to the downside. As you move to larger FOV displays the brain will start doing more saccades (rapid step changes in viewpoint[1]) and the response time of the image generator and eye tracker is too slow to generate more pixels at the right spot. It's much more effective to just render the whole thing at the maximum possible resolution. There has been promising research on rendering at a reduced update rate or reduced geometry in low interest areas of the scene[2].