CAG: Context-Conditional 2D Affordance Generation

Authors
Kim, GeonkukChoi, Tae-MinPark, ShinsukPark, Juyoun
Issue Date
2025-09-14
Publisher
IEEE
Citation
2025 IEEE International Conference on Image Processing (ICIP), pp.1 - 6
Abstract
Affordance map generation has become a key topic in the field of cognition and decision-making for machines such as robots. Most studies focus on extracting cognition-leveraged action information from human-object interaction videos using affordance networks. Recently, there have been attempts to combine vision and language data in order to improve model versatility. However, previous works cannot capture deep contextual meaning or generate appropriate affordances for different task objectives or situations involving the same object. To address this limitation, we propose Context-conditional 2D Affordance Generation (CAG)―a language-leveraged affordance map generation model. We utilize foundation models to extract contextual knowledge from human video datasets where various objects are interacted with across different environments. Our approach successfully understands given objectives, even when presented with complex sentences, and generates relevant conditional affordance maps.
URI
https://pubs.kist.re.kr/handle/201004/153228
DOI
10.1109/icip55913.2025.11084719
Appears in Collections:
KIST Conference Paper > Others
Export
RIS (EndNote)
XLS (Excel)
XML

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE