Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Atlas: End-to-End 3D Scene Reconstruction from Posed Images (github.com/magicleap)
113 points by taylored on Aug 12, 2020 | hide | past | favorite | 25 comments


From the folks at Magic Leap. It looks remarkably good to me.

The video at https://www.youtube.com/watch?v=9NOPcOGV6nU&feature=youtu.be is worth watching, especially the parts showing how the model gradually constructs and improves a labeled 3D mesh of a live room as it is fed more visual data by walking around the room.

--

On a related note, Magic Leap has been trying to find a buyer for the business for several months now:

https://www.roadtovr.com/report-magic-leap-buyer-sale/

https://www.bloomberg.com/news/articles/2020-03-11/augmented...


I have no experience in this field at all and they note on the video that the sequence shown was not realtime but I wonder how far we're from having something like this running in realtime or how "realtime" it could be given fancy hardware to be used in the wild?


Without much optimization, it can run at ~14fps on a NVidia TiTan RTX


On a tangential thought, it's interesting to me that a company (magicleap) that has raised several billion dollars generates so little value compared to other companies its size that this is the most notable output from them in a year and I thought it was a phd project until I looked at the project owner. Anyways, it's a very interesting project and thanks for sharing.


Yeah I have to agree. If this were a PhD thesis it would certainly deserve some praise, but given that this is the most exciting thing to come out of magic leap in years just barely puts them on par with SOTA...well I would be pretty pissed if I was an investor in them.


It is easy to have a billion dollar company if you have borrowed two billion. And haven't spent all the loot on salaries.

The company has a cool name and the product area is divisive. Some say it is vapourware and nobody wants Oculus Rift style VR. Others are gung-ho. It's like Bitcoin all over again.

Although this tech is being done with AI, it was being done with non-AI approach two decades ago for movies/TV. But it wasn't as if people ported this tech to their smart phones from the SGI desktop monsters of yesteryear.


Here's a challenge question to folks reading this and learned with the tools of the trade (my apologies in advance for somewhat hijacking the thread): consider this video of an endoscopy: https://www.youtube.com/watch?v=DUVDKoKSEkU -- say, from 3:00 to 5:00. And I have a bunch of movies (i.e., a series of images!) and I want to do a 3d reconstruction of this.

It seems super, super difficult... there are free-flowing liquids, and since this is an esophagus/upper lining of the stomach which is changing in form quite drastically so often. How would you guys approach this problem?


Even more hijacking, I remember thinking medical applications were going to be the killer apps for VR. I was blown away by these demos almost half a decade ago https://youtu.be/MWGBRsV9omw?t=251

Did they ever make it into real life practice?


Thanks for linking to Doc Ok's youtube channel!

https://www.youtube.com/c/okreylos/videos

5 years ago he was active in the Vive VR world

http://doc-ok.org/


You're not the first to come up with that challenge ;) https://endovis.grand-challenge.org/


I wonder how long it's going to be before we're able to run a significant portion of Youtube video (tourist videos, etc) through something like this, and generate a huge 3d mesh of the world. Combined with Street View data, you'd really have a ton of spaces covered.


I believe random videos are too low of a quality. Like this, most of the stuff I've seen uses constrained videos.

I have seen random still images used for this kind of thing: https://nerf-w.github.io/

I haven't heard of any equivalent of EXIF for video. That goes a long way when trying to make sense of random video both for camera settings as well as GPS location if you're trying to correlate multiple videos.


GoPro has a proprietary format that stores live metadata in the videos if I recall. Maybe it’s called GPX? About 6 months ago I extracted GPS coordinates from a video using an open source tool.


Google will do this, and then sell the data to security institutions. We will be told about it later, or consent to it during a Terms&Conditions update.


Cool idea, but how would you keep it maintained? It's tricky enough to keep maps up to date. A 3D Mesh would be even more complex to maintain.


Looks awesome. Given it takes position data along with images, how accurate must the position data be? Could it handle something like sensor drift in the position data over time?


For anyone with domain knowledge, how applicable is Google's NeRF work here in comparison? Is there any overlap?

https://nerf-w.github.io/

https://news.ycombinator.com/item?id=24071787

EDIT: @bitl: Tremendous, thanks for the reply. Would be amazing to be able to build these scenes just by walking around scanning a room with your mobile phone while it records video for processing the frames into scenes (especially considering mobile platforms with a depth sensor for enrichment of the collected data).


By default NeRF does not produce a mesh (but one could use marching cubes as does Atlas) and it requires training a neural network for each scene whereas Atlas (as far as I understand it) uses pretrained network to process new scenes.

NeRF would probably produce a much better final result but the Atlas approach (no need to train something from scratch) is the only one that can hope to be run in real time which is vital for some application.


NeRF has a potential to make all those classical methods obsolete, though it requires many input images and I am not sure how it handles rolling shutter and other distortions.


Is there anything that would prevent this approach working on 360 video?


In theory this should work. I’ve been doing photogrammetry with spherical video and existing software packages often want to “dewarp” the image on to a plane, which works fine for narrow field of view but fails on spherical video. It would be interesting to see if atlas supports spherical input. Also 360 cameras have pretty low visual acuity. My 5.6k GoPro Fusion has to divide those pixels across the whole field of view, so images are less detailed. Still I think 360 video can be useful in photogrammetry with the right algorithms.


Worst case, you can sample the 360 frames to get images with a smaller field of view. However, the app takes in camera intrinsics and positional data so it seems like it would work out of the box.


i imagine a lot of unfortunate artefacts come out of stitching together the camera views that form a 360 or "spherical" image.


Well on a camera with dual fish eye lenses for 360 vision there’s some blurring at the edge where the images are merged together. But actually each camera separately just has normal fish eye effects, and if both images are used without blending them together you’d have minimal artifacts. Biggest issue is low visual acuity imo.


Ladies and gentlemen you are looking at the pinnacle of mankind's technological achievements. The proof?

We can now make tiny virtual cars do stunts off object in the real world: https://www.youtube.com/watch?v=9NOPcOGV6nU&feature=youtu.be




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: