DuLa-Net: A Dual-Projection Network for Estimating Room Layouts from a Single RGB Panorama
National Tsing Hua University
National Tsing Hua University
KAUST
KAUST
National Tsing Hua University
National Tsing Hua University
Abstract
We present a deep learning framework, called DuLa-Net,
to predict Manhattan-world 3D room layouts from a single RGB panorama. To achieve better prediction accuracy,
our method leverages two projections of the panorama at
once, namely the equirectangular panorama-view and the
perspective ceiling-view, that each contains different clues
about the room layouts. Our network architecture consists of two encoder-decoder branches for analyzing each
of the two views. In addition, a novel feature fusion structure is proposed to connect the two branches, which are
then jointly trained to predict the 2D floor plans and layout heights. To learn more complex room layouts, we introduce the Realtor360 dataset that contains panoramas
of Manhattan-world room layouts with different numbers
of corners. Experimental results show that our work outperforms recent state-of-the-art in prediction accuracy and
performance, especially in the rooms with non-cuboid layouts.
Algorithm
Framework overview. Given the input as an equirectangular panoramic image, we follow the same pre-processing step used in PanoContext to align the panoramic image with a global coordinate system, i.e. we make a Manhattan world assumption. Then, we transform the panoramic image into a perspective ceiling-view image through an equirectangular to perspective (E2P) conversion. The panorama-view and ceiling-view images are then fed to a network consisting of two encoder-decoder branches. These two branches are connected via an E2P-based feature fusion scheme and jointly trained to predict a floor plan probability map, a floor-ceiling probability map, and the layout height. Two intermediate probability maps are derived from the floor-ceiling probability map using E2P conversion and combined with floor plan probability map to obtain a fused floor plan probability map. The final 3D Manhattan layout is determined by extruding a 2D Manhattan floor plan estimated on the fused floor plan probability map using the predicted layout height.
Results
Visual results. Given a single RGB panorama, our method automatically estimates the corresponding 3D room layout. Our
method is flexible to handle more complex room layout beyond the simple cuboid room. The checkerboard patterns on the walls indicate
the missing textures due to occlusion.
Acknowledgement
The project was funded in part by
the KAUST Office of Sponsored Research (OSR) under
Award No. URF/1/3426-01-01, and the Ministry of Science and Technology of Taiwan (107-2218-E-007-047- and
107-2221-E-007-088-MY3).
Bibtex
@inproceedings{Yang:2019:DuLa-Net, author = {Yang, Shang-Ta and Wang, Fu-En and Peng, Chi-Han and Wonka, Peter and Sun, Min and Chu, Hung-Kuo}, title = {DuLa-Net: {A} Dual-Projection Network for Estimating Room Layouts From a Single {RGB} Panorama}, booktitle = {{IEEE} Conference on Computer Vision and Pattern Recognition, {CVPR} 2019}, pages = {3363--3372}, year = {2019} }