Another approach to space efficiency would be to do away with the need for dual ...

munificent · on June 19, 2018

> Decode the mono video and use the disparity map to interpolate the view of either eye.

By "disparity map" are you thinking something like a heightmap applied to the scene facing the viewer and then you use that to skew things for each eye?

If so, how would that handle parts of the scene that are occluded/revealed to one eye but not the other?

kazinator · on June 19, 2018

How does video encoding like H.264 handle parts of a scene that are occluded in one frame, but not occluded in the next frame?

A three inch difference between two cameras producing simultaneous frames is similar to a three inch sideways step of one camera in time between two frames.

iamleppert · on June 19, 2018

True, occlusions would be a problem but we’re taking about fake autostereoscopic 3D here, where most of the stereo rigs used for capture have but a modest baseline. Almost all of the depth perception comes from disparity, occlusions would still in fact be very visible by the averaging method I described and be at whatever depth plane of the occluder which is probably a good guess anyway. Not like your other eye would receive a correspondence from an occlusion in the real world.

romwell · on June 20, 2018

FYI, there's online software[1] to recreate 3D/stereoscopic 3D imagery from the depth-enabled photos taken e.g. by Moto G5S (which has a dual-camera setup that computes the depth map, but no API to extract/store the image taken by the other camera).

My personal opinion is that true stereoscopic images feel better when there's enough detail; those occlusions do matter. For some imagery it doesn't matter as much though.

[1]http://depthy.me

teraflop · on June 19, 2018

First of all, the process of converting a stereo pair into a flat image and a disparity map would be lossy and introduce artifacts. Even assuming you could accurately capture pixels that were occluded from one viewpoint and not the other, the approach is inherently unable to handle effects such as partially-transparent or glossy surfaces.

Secondly, the limiting factor described in the post is not space efficiency, it's decoding performance. It doesn't do much good to halve the amount of data required to represent a frame if it takes twice as long to reconstruct the raw pixels for display.

kazinator · on June 19, 2018

The difference between two images can be non-lossy, if you like.

> partially-transparent or glossy surfaces.

It's all just RGB values; there is no gloss or transparency in an image. (Image layers can have transparency for compositing, but that's obviously something else.)

If audio encoding can have "joint stereo", why not visual coding.

Many areas of a stereo image are nearly identical, like the distant background.

iamleppert · on June 19, 2018

I’m sorry for not detailing a perfect compression method in a single HN post. I’ll do better next time, promise. lol

modeless · on June 19, 2018

Yeah, there's a lot you could do, but unfortunately the only thing that makes decoding high resolution video feasible on these devices is fixed-function video decoding hardware which can't support new ideas like this. You'd have to lobby standards bodies to add VR-specific features to codecs and wait many years for hardware to implement them.

Ultimately what you want for VR is some kind of light field video compression. You can get a taste for what that would be like here, although it's mostly still images for now: https://store.steampowered.com/app/771310/Welcome_to_Light_F...

dyarosla · on June 19, 2018

I don’t now if this assumption is true ‘the fact there isn’t a lot changing fundamentally in the scene when you move your head’

GuB-42 · on June 20, 2018

What you are suggesting is essentially a new codec. It sounds like a good idea, however, that's a thing for the future.

The "disparity map" you suggest seems to exist in 3D Blu-rays (Multiview Video Coding), but there may be some technical limitations that make it unsuited for the Oculus Go.

brokencode · on June 19, 2018

Sure, those ideas might hold up for objects far away from the eyes, but for nearby objects there can a pretty big difference between what each eye sees. I think the human brain would quickly call BS on an image processed through that kind of compression, and it would not be very immersive or realistic.

kazinator · on June 19, 2018

There is a difference between what the successive frames of a video depict. Yet, video compression heavily relies on encoding just the differences between successive frames, which is very effective.

neuronexmachina · on June 19, 2018

I think this more-or-less corresponds to the first approach you described. It's part of the Multiview Video Coding amendment to H.264: https://en.wikipedia.org/wiki/2D_plus_Delta

voxadam · on June 20, 2018

Would this be similar to joint-stereo audio encoding?