Just take a look at the historical photos results! That's one of the coolest things I've seen in well over a year. I've come across most of these photos before in textbooks and on the web, but this demo makes all of those historical figures and moments in time feel as real as if they were happening in the world today. It's much more pronounced than colorizing black and white photos. I feel connected.
I'm convinced that the folks that present at SIGGRAPH are capable of nothing short of black magic wizardry. I can understand the technology being used, but my visual cortex only sees the impossible becoming real.
This is truly deserving of the word awesome, because I am in awe.
Problem is, this paper won't be able to produce those results from historical photos. I find them quite misleading, to be honest.
You'll need to either use a separate 3D depth estimation AI, or (more likely) have someone do a manual stereoscopic 3D conversion of your historical image. Only then (when you have depth data) can the algorithm presented in this paper start its work.
In the meantime, I tried our Midas on other images and it failed most of the time. Really only works for those stock-is pictures with clear foreground and background and I believe it mainly just detects bokeh blur as opposed to actually understanding the scene.
Yes, you are right. Existing single-image depth estimation models are not far from perfect. We hope to see future development in that direction to further improve the visual quality of 3D photos.
Very cool! Not quite the same because they use 20-50 input images and this uses 1. But it looks like it might be really cool for photogrammetry. The traditional photogrammetry software sucks.
Very very impressive 3D reconstruction.
Apparently It can even reconstruct very well ropes, fine details. That's amazing !
The best system I have ever seen this year on reconstruction...
Beavo
I wonder if this can be tweaked to generate the 45-perspective quilt files required to feed this into a Looking Glass. https://lookingglassfactory.com/
After reviewing the looking glass docs, yes. The paper presented here can produce depth layers. If you would then flatten those depth layers with 45 different parallax angles and concatenate the results, you'd have something similar to their quilt files.
Except that there's 45 distinct planes and it directly interfaces with Unity, Unreal and ThreeJS in minutes.
Sorry if I'm reading too deeply into your note. There are just so many haters. Meanwhile, I backed these guys on Kickstarter, have had a unit on my desk for 18 months and think it's one of the most incredible things I've ever had to experiment with... and it cost me under $500.
They probably don't know how difficult it is to generate good lenticular printing data.
I'm the author of a lenticular vray plugin and people love the technology for product design previews. Plus the fact that it's being mass-produced as children's toys makes it very affordable.
It's a bit sad that their active resolution is so low, with 2560x1600 being divided into 9x5 quilt frames. So that would mean 320x280 or so effective resolution for the 8.9" display
Outstanding work. I've been really impressed with the quality of recent colorization, perspective, and texture reconstruction tools and think they do a wonderful job of bringing historical or degraded images 'back to life.' I wonder if we are headed for a future where many of these tools reside client-side and image/video transmission and storage can be reduced to a lightweight stream of vector data.
Take a moment and have a deep breath before you get excited about full 3d reconstruction with a single image.
That isn't what this is.
Watch the video in the bottom right corner, entitled "Comparison with State of the art".
Now go and rewatch the examples and actually look at the edges of the objects as they move the camera 'a bit'. You'll see a tonne of artifacting. Less, clearly, than the existing state of the art, so I tip my hat to the efforts here.
...but all this is doing is generating an 'empty gap' generated by perspective and then basically using the equivalent of photoshop's content aware fill to fill that gap with plausible pixels.
Since humans don't really pay much attention to edge details, it's quite plausible.
To quote the paper:
> In this work we present a new learning-based method that generates a 3D photo from an RGB-D input.
Ie. This work is taking a depth image as the input and working on that to generate a 3d photo, rebuilding the full content of each 2d layer in the image.
Ie. The output is not a 3d model, it is a series of 2D images at depth intervals, where the occluded content in each layer is in-painted (ie. generated artifically).
(NB. The 'from a single source image' work used here is not novel; they're just using existing approaches to estimate a depth image)
It might be "all" that it's doing, but it does it quite well and in a way that is quite believable, which is significantly better than what came before, which makes it almost realistic. That's what you said, but the way you said it felt like it lessened the achievement.
> which is significantly better than what came before
It's a bit better. That's the point I'm making; it's just incremental improvement on existing process. Read the actual paper, eg. under 'Quantitative comparison'.
If you think I'm belittling the effort I'm sorry, that's not my intention; ...but, for example, the other comments talking about using it to generate a full 3d model to display on a looking glass surface, or in VR displays a total lack of understanding of what has been achieved here.
Their example images are still 2D, but potentially very interesting for making 360 degree videos much more immersive in virtual reality. (May be some extra steps to ensure consistent rendering among adjacent frames?).
Would be awesome if this could be integrated into gallery software as a 3D Ken Burns effect. The artificial camera would not have to move that much so that the inevitable artifacts would be much less visible.
The main issue that I see with this work is that they require RGBD data, meaning you have to do a lidar scan or something similar to measure the depth map. Alternatively, you could pay someone to draw it by hand, but that takes a long time.
So basically, if you have impossible to get input data, this network can do its magic.
What it then does is hallucinary inpainting, so something like the Photoshop content aware fill. If there's a tree in your photo, this one will make up a fake background behind it, so that you could move or remove the tree without things looking weird.
> So basically, if you have impossible to get input data, this network can do its magic.
Except they evidently got the input data for the examples in the paper, so it can't be impossible to get.
They cite at least two different methods for adding depth information to a single image to generate the necessary RGBD data, different views of which can then be rendered with their inpainting applied:
Yes, you can always create that data by hand, but it's too expensive and, hence, impossible to scale.
As for using other AIs, they tend to not work too well on more complex images.
But in any case, getting the data that you need to be able to use the paper here is very challenging.
Edit: I should probably say that I have hands-on experience with MegaDepth and Midas and that it was underwhelming. Both of them assume a Gradient from close to far from the bottom to the top and both of them assume that optical variation will be in the foreground. A photo of a dining table from the side is already enough to confuse both of them.
True but their collab demo uses a separate network to infer depth information from a simple RGB picture (its a classical task nowadays) so it is not a problem in practice.
Sadly not, it's still an unsolved problem in the general case. Yes, it works tolerably well for some images, but converting RGB to RGBD is anything but easy. That's why the 3D-ification of cinema movies still requires thousands of people and millions in budget.
Kind of an interesting tangent, but the founder of Lytro (Ren Ng) is the advisor on a really cool paper that came out recently and reminds me a lot of this: http://www.matthewtancik.com/nerf
Basically, it's similar to this but also re-lights reflective parts at the cost of needing more than one source image.
You could translate this to a non-CUDA GPU, such as a mobile GPU, but even that would require a bit of effort to be able to condense it such that is wasn't a total lag fest. Executing this on CPU seems damn near impossible from a usability standpoint given the large matrix multiplication involved. You really need the parallel capabilities of a GPU.
It relies on torch and openCV:
- I have never tried running openCV explicitely on CPU but I believe it is doable.
- It is trivial to run torch on CPU instead of GPU (just comment the line that sends the code to the GPU).
I can't speak for all photos - but I'm fairly certain those 3d looking photos on FB must be taken with with a stereoscopic camera (2+ lenses). They can calculate depth when they know the distance between the lenses (and differences in focal lengths, etc).
That isn't to say they couldn't do it retroactively for single lens photos, but I'm guessing not right now they're not.
I don't see the problem with it. I think their point is that they are important historical photos, and seeing them this way puts some new life in them and can enhance their emotional resonance.
It's not like they come off as "let's remember the good ol' days with the separate water fountains for the colored folks." Not to me anyway.
The ensemble was effective in delivering an emotional impact. Its not often that a tech demo makes me feel such a range of emotions, moving me through profound sadness and hope. It left an impact far greater than typical marketing videos, giving me the subliminal impression (of which I am consciously skeptical) that this technology was meant for legit art.
If they didn’t do stuff like this then the only purpose of these efforts would be to make your cat pics 0.1% more monetizable for Mark Zuckerberg. It’s the application to historical photos like this that is perhaps the most important result of this work.
These are some of humanity’s most important photos, because they show segregation, war, and famine. Bringing them to life arguably increases their impact.
Are you trying to paraphrase the parent? I disagree with the person you are responding to, but I feel you severely misrepresent what they actually said.
I'm convinced that the folks that present at SIGGRAPH are capable of nothing short of black magic wizardry. I can understand the technology being used, but my visual cortex only sees the impossible becoming real.
This is truly deserving of the word awesome, because I am in awe.