CTGAN: Semantic-guided Conditional Texture Generator for 3D Shapes
National Taiwan University
National Tsing Hua University
National Tsing Hua University
National Tsing Hua University
University of Maryland College Park
National Taiwan University
National Tsing Hua University
Abstract
The entertainment industry relies on 3D visual content to create immersive experiences, but traditional methods for creating textured 3D models can be time-consuming and subjective. Generative networks such as StyleGAN have advanced image synthesis, but generating 3D objects with high-fidelity textures is still not well explored, and existing methods have limitations. We propose the Semantic-guided Conditional Texture Generator (CTGAN), producing high-quality textures for 3D shapes that are consistent with the viewing angle while respecting shape semantics. CTGAN utilizes the disentangled nature of StyleGAN to finely manipulate the input latent codes, enabling explicit control over both the style and structure of the generated textures. A coarse-to-fine encoder architecture is introduced to enhance control over the structure of the resulting textures via input segmentation. Experimental results show that CTGAN outperforms existing methods on multiple quality metrics and achieves state-of-the-art performance on texture generation in both conditional and unconditional settings.
Algorithm
Given 3D model as input, we start with texture parameterization to generate the corresponding UV maps and the segmentation maps. The texture generator then takes style code as input and generates the texture maps based on the segmentation maps. To ensure view-consistent results, we divide the style code and separately encode segmentation maps and style image into the structure representation and the style representation using structure encoder and style encoder. Finally, we apply our generated texture maps on the 3D model and produce the 3D textured model.
Results
Qualitative comparison on texture generation. First row: the input style images (only for the conditional part) and the
First row: the input style images (only for the conditional part) and the input 3D models. Bottom 3 rows: the generated 3D textured models for each method using the input data from the first row. Our method produces superior results in generating texture maps that are more similar to style images and more view-consistent.
method produces superior results in generating texture maps that are more similar to style images and more view-consistent.