CVRecon: Rethinking 3D Geometric Feature Learning For Neural Reconstruction

ICCV 2023

Ziyue Feng¹, Leon Yang², Pengsheng Guo³, Bing Li¹
¹Clemson University ²Microsoft ³Carnegie Mellon University

Paper

Code

Supplementary

Novel 3D geometric feature learning paradigm.

Abstract

Recent advances in neural reconstruction using posed image sequences have made remarkable progress. However, due to the lack of depth information, existing volumetric-based techniques simply duplicate 2D image features of the object surface along the entire camera ray. We contend this duplication introduces noise in empty and occluded spaces, posing challenges for producing high-quality 3D geometry. Drawing inspiration from traditional multi-view stereo methods, we propose an end-to-end 3D neural reconstruction framework CVRecon, designed to exploit the rich geometric embedding in the cost volumes to facilitate 3D geometric feature learning. Furthermore, we present Ray-contextual Compensated Cost Volume ( $R C C V$ ), a novel 3D geometric feature representation that encodes view-dependent information with improved integrity and robustness. Through comprehensive experiments, we demonstrate that our approach significantly improves the reconstruction quality in various metrics and recovers clear fine details of the 3D geometries. Our extensive ablation studies provide insights into the development of effective 3D geometric feature learning schemes. The code will be made publicly available.

Reconstruction of Our CVRecon

Reconstruction of VoRTX (Current SOTA)

The only difference between our CVRecon and the VoRTX is we use our $R C C V$ as the 3D geometric feature representation, which leads to significantly clear geometry details. You can zoom in by scrolling. Toggle the “Single Sided” option in Model Inspector (pressing I key) to enable back-face culling (see through walls).

Pipeline overview

CVRecon Architecture: We first build standard cost volumes for each keyframe with reference frames. Novel $R C C V$ s are then generated with our proposed Ray Compensation and Contextual In-painting. We tri-linear grid-sample and fuse the $R C C V$ s as a global feature volume and the TSDF reconstruction is later inferred with 3D Convolutions.