3D semantic image synthesis with geometric and semantic consistency

Authors
Kim, JihyunOh, ChangjaeDo, HoseokChoi, SunghwanSohn, Kwanghoon
Issue Date
2026-01
Publisher
Elsevier
Citation
Expert Systems with Applications, v.295
Abstract
3D semantic image synthesis generates photo-realistic and view-consistent images from a single semantic mask, which typically requires skills that apply to many practical applications like image generation, editing, and data augmentation. Existing methods for semantic image synthesis primarily focus on image reconstruction for the same view of the input, leading to artifacts when generating images from different views. To alleviate this, we propose a novel framework employing a learning-based 3D GAN inversion, which enables the generation of 3D-aware RGB images and corresponding semantic masks from a 2D single-view semantic mask. We present a Semantic Component-guided Normalization ResNet block, allowing our encoder to capture semantic representations and reflect them to the output images. To ensure semantic consistency across different views, we introduce a semantic decoder that produces an auxiliary-view semantic mask. This mask serves as a pseudo-input for learning 3D properties. Furthermore, we incorporate a 3D geometric prior that encourages the model to produce high-fidelity images from various viewpoints. Experimental results demonstrate that our method outperforms state-of-the-art 3D-aware semantic image synthesis methods.
Keywords
Deep learning; Generative adversarial model; Semantic image synthesis; 3D image synthesis
ISSN
0957-4174
URI
https://pubs.kist.re.kr/handle/201004/152899
DOI
10.1016/j.eswa.2025.128782
Appears in Collections:
KIST Article > Others
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE