From the folks at Magic Leap. It looks remarkably good to me.
The video at https://www.youtube.com/watch?v=9NOPcOGV6nU&feature=youtu.be is worth watching, especially the parts showing how the model gradually constructs and improves a labeled 3D mesh of a live room as it is fed more visual data by walking around the room.
--
On a related note, Magic Leap has been trying to find a buyer for the business for several months now:
I have no experience in this field at all and they note on the video that the sequence shown was not realtime but I wonder how far we're from having something like this running in realtime or how "realtime" it could be given fancy hardware to be used in the wild?
On a tangential thought, it's interesting to me that a company (magicleap) that has raised several billion dollars generates so little value compared to other companies its size that this is the most notable output from them in a year and I thought it was a phd project until I looked at the project owner. Anyways, it's a very interesting project and thanks for sharing.
Yeah I have to agree. If this were a PhD thesis it would certainly deserve some praise, but given that this is the most exciting thing to come out of magic leap in years just barely puts them on par with SOTA...well I would be pretty pissed if I was an investor in them.
It is easy to have a billion dollar company if you have borrowed two billion. And haven't spent all the loot on salaries.
The company has a cool name and the product area is divisive. Some say it is vapourware and nobody wants Oculus Rift style VR. Others are gung-ho. It's like Bitcoin all over again.
Although this tech is being done with AI, it was being done with non-AI approach two decades ago for movies/TV. But it wasn't as if people ported this tech to their smart phones from the SGI desktop monsters of yesteryear.
Here's a challenge question to folks reading this and learned with the tools of the trade (my apologies in advance for somewhat hijacking the thread): consider this video of an endoscopy: https://www.youtube.com/watch?v=DUVDKoKSEkU -- say, from 3:00 to 5:00. And I have a bunch of movies (i.e., a series of images!) and I want to do a 3d reconstruction of this.
It seems super, super difficult... there are free-flowing liquids, and since this is an esophagus/upper lining of the stomach which is changing in form quite drastically so often. How would you guys approach this problem?
Even more hijacking, I remember thinking medical applications were going to be the killer apps for VR. I was blown away by these demos almost half a decade ago https://youtu.be/MWGBRsV9omw?t=251
I wonder how long it's going to be before we're able to run a significant portion of Youtube video (tourist videos, etc) through something like this, and generate a huge 3d mesh of the world. Combined with Street View data, you'd really have a ton of spaces covered.
I haven't heard of any equivalent of EXIF for video. That goes a long way when trying to make sense of random video both for camera settings as well as GPS location if you're trying to correlate multiple videos.
GoPro has a proprietary format that stores live metadata in the videos if I recall. Maybe it’s called GPX? About 6 months ago I extracted GPS coordinates from a video using an open source tool.
Google will do this, and then sell the data to security institutions. We will be told about it later, or consent to it during a Terms&Conditions update.
Looks awesome. Given it takes position data along with images, how accurate must the position data be? Could it handle something like sensor drift in the position data over time?
EDIT: @bitl: Tremendous, thanks for the reply. Would be amazing to be able to build these scenes just by walking around scanning a room with your mobile phone while it records video for processing the frames into scenes (especially considering mobile platforms with a depth sensor for enrichment of the collected data).
By default NeRF does not produce a mesh (but one could use marching cubes as does Atlas) and it requires training a neural network for each scene whereas Atlas (as far as I understand it) uses pretrained network to process new scenes.
NeRF would probably produce a much better final result but the Atlas approach (no need to train something from scratch) is the only one that can hope to be run in real time which is vital for some application.
NeRF has a potential to make all those classical methods obsolete, though it requires many input images and I am not sure how it handles rolling shutter and other distortions.
In theory this should work. I’ve been doing photogrammetry with spherical video and existing software packages often want to “dewarp” the image on to a plane, which works fine for narrow field of view but fails on spherical video. It would be interesting to see if atlas supports spherical input. Also 360 cameras have pretty low visual acuity. My 5.6k GoPro Fusion has to divide those pixels across the whole field of view, so images are less detailed. Still I think 360 video can be useful in photogrammetry with the right algorithms.
Worst case, you can sample the 360 frames to get images with a smaller field of view. However, the app takes in camera intrinsics and positional data so it seems like it would work out of the box.
Well on a camera with dual fish eye lenses for 360 vision there’s some blurring at the edge where the images are merged together. But actually each camera separately just has normal fish eye effects, and if both images are used without blending them together you’d have minimal artifacts. Biggest issue is low visual acuity imo.
The video at https://www.youtube.com/watch?v=9NOPcOGV6nU&feature=youtu.be is worth watching, especially the parts showing how the model gradually constructs and improves a labeled 3D mesh of a live room as it is fed more visual data by walking around the room.
--
On a related note, Magic Leap has been trying to find a buyer for the business for several months now:
https://www.roadtovr.com/report-magic-leap-buyer-sale/
https://www.bloomberg.com/news/articles/2020-03-11/augmented...