DSpace at KIST: 3D semantic image synthesis with geometric and semantic consistency

3D semantic image synthesis with geometric and semantic consistency

Authors: Kim, Jihyun; Oh, Changjae; Do, Hoseok; Choi, Sunghwan; Sohn, Kwanghoon

Abstract: 3D semantic image synthesis generates photo-realistic and view-consistent images from a single semantic mask, which typically requires skills that apply to many practical applications like image generation, editing, and data augmentation. Existing methods for semantic image synthesis primarily focus on image reconstruction for the same view of the input, leading to artifacts when generating images from different views. To alleviate this, we propose a novel framework employing a learning-based 3D GAN inversion, which enables the generation of 3D-aware RGB images and corresponding semantic masks from a 2D single-view semantic mask. We present a Semantic Component-guided Normalization ResNet block, allowing our encoder to capture semantic representations and reflect them to the output images. To ensure semantic consistency across different views, we introduce a semantic decoder that produces an auxiliary-view semantic mask. This mask serves as a pseudo-input for learning 3D properties. Furthermore, we incorporate a 3D geometric prior that encourages the model to produce high-fidelity images from various viewpoints. Experimental results demonstrate that our method outperforms state-of-the-art 3D-aware semantic image synthesis methods.

Keywords: Deep learning; Generative adversarial model; Semantic image synthesis; 3D image synthesis

Show Full Item Record

KIST Library Institutional Repository