Difference Inversion: Interpolate and Isolate the Difference with Token Consistency for Image Analogy Generation

Authors
Kim, HyunsooKim, DonghyunKim, Suhyun
Issue Date
2025
Publisher
IEEE COMPUTER SOC
Citation
2025 Conference on Computer Vision and Pattern Recognition-CVPR-Annual, pp.18250 - 18259
Abstract
How can we generate an image B′ that satisfies A : A′ :: B : B′, given the input images A,A′ and B? Recent works have tackled this challenge through approaches like visual in-context learning or visual instruction. However, these methods are typically limited to specific models (e.g. In-structPix2Pix. Inpainting models) rather than general diffusion models (e.g. Stable Diffusion, SDXL). This dependency may lead to inherited biases or lower editing capabilities. In this paper, we propose Difference Inversion, a method that isolates only the difference from A and A′ and applies it to B to generate a plausible B′. To address model de pendency, it is crucial to structure prompts in the form of a "Full Prompt" suitable for input to stable diffusion models, rather than using an "Instruction Prompt". To this end, we accurately extract the Difference between A and A′ and combine it with the prompt of B, enabling a plug-and-play application of the difference. To extract a precise difference, we first identify it through 1) Delta Interpolation. Additionally, to ensure accurate training, we propose the 2) Token Consistency Loss and 3) Zero Initialization of Token Embeddings. Our extensive experiments demonstrate that Difference Inversion outperforms existing baselines both quantitatively and qualitatively, indicating its ability to generate more feasible B′ in a model-agnostic manner.
ISSN
1063-6919
URI
https://pubs.kist.re.kr/handle/201004/154357
DOI
10.1109/CVPR52734.2025.01701
Appears in Collections:
KIST Conference Paper > 2025
Export
RIS (EndNote)
XLS (Excel)
XML

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE