Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Have a look at the google drive files if you click on the data link; it's a lot more than 2 photos.

eg. That dinosaur skeleton is derived from 60 photos. The drumkit comes from ~100.

...so it's not magic, it's very close to what you get from standard photogrammetry. The big part of this is that it isn't representing the scene as block of voxels like some other approaches.

> The biggest practical tradeoffs between these methods are time versus space.

> LLFF produces a large 3D voxel grid for every input image, resulting in enormous storage requirements (over 15GB for one “Realistic Synthetic” scene).

> Our method requires only 5 MB for the network weights (a relative compression of 3000× compared to LLFF), which is even less memory than the input images alone for a single scene from any of our datasets.

Anyway, so... if you could do the same sort of thing with a similar accuracy to non-images for a 'neural representation database', that'd be pretty neat.



Also I do think the 3d voxel reconstruction approach and the nerf approach solves different goals. I didn't read the original nerf paper thoroughly but AFAIK the network learns to interpolate between the photos in a beautiful, smooth way, but the voxel representation would allow a lot of other reconstructions.


If the photos span the camera's full position-orientation vector space, I don't see why you can't put the camera anywhere in the scene.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: