Περίληψη: | Parsing complex 3D scenes into compact low-dimensional representations has been a long-standing goal in Computer Vision that could tremendously benefit various downstream applications such as scene understanding and reconstruction. Based on the output of the final representation, existing methods can be categorized to explicit and implicit methods. In particular, implicit-based approaches have recently gained popularity due to their simple yet efficient parameterization. The primary goal of these works (such as OccNet, SRN, Neural Volumes etc,) is to create implicit representations by mapping 3D points with the pertinent scenes. Although these techniques render promising results they struggle to learn to represent complex scenes. Neural Radiance Fields was the breakthrough in this direction. Mapping scene geometry and appearance with a spatial 3D location turned out to be the simplest and most effective method since then. Not only did it overpass the previous methods in terms of fidelity and accuracy, but was able to encode even complex scenes.
Neural Radiance Fields as originally proposed by Mildenhall et al. are limited to only single scene overfitting and time efficiency. Since mapping is performed by training an MLP for a scene specifically, many methods proposed afterwards came to address this problem. GRF and PixelNeRF for example exploit image features in order to learn scene priors that allow multi scene training. Despite the fact that NeRF produced state of the art results in novel view synthesis task, it lacked accuracy in terms of 3D reconstruction. Recent works such as UniSURF show that we can produce accurate surface reconstructions by combining surface and volumetric rendering. In our work, we combine PixelNeRF with UniSURF, by applying accurate surface extraction methods to multi scene 3D implicit representations.
|