Recurrent DETR: Transformer-Based Object Detection for Crowded Scenes
- Authors
- Choi, Hyeong Kyu; Paik, Chong Keun; Ko, Hyun Woo; Park, Min-Chul; Kim, Hyunwoo J.
- Issue Date
- 2023-07
- Publisher
- IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
- Citation
- IEEE ACCESS, v.11, pp.78623 - 78643
- Abstract
- Recent Transformer-based object detectors have achieved remarkable performance on benchmark datasets, but few have addressed the real-world challenge of object detection in crowded scenes using transformers. This limitation stems from the fixed query set size of the transformer decoder, which restricts the model's inference capacity. To overcome this challenge, we propose Recurrent Detection Transformer (Recurrent DETR), an object detector that iterates the decoder block to render more predictions with a finite number of query tokens. Recurrent DETR can adaptively control the number of decoder block iterations based on the image's crowdedness or complexity, resulting in a variable-size prediction set. This is enabled by our novel Pondering Hungarian Loss, which helps the model to learn when additional computation is required to identify all the objects in a crowded scene. We demonstrate the effectiveness of Recurrent DETR on two datasets: COCO 2017, which represents a standard setting, and CrowdHuman, which features a crowded setting. Our experiments on both datasets show that Recurrent DETR achieves significant performance gains of 0.8 AP and 0.4 AP, respectively, over its base architectures. Moreover, we conduct comprehensive analyses under different query set size constraints to provide a thorough evaluation of our proposed method.
- Keywords
- Computer vision; object detection; detection transformers; dynamic computation
- ISSN
- 2169-3536
- URI
- https://pubs.kist.re.kr/handle/201004/113491
- DOI
- 10.1109/ACCESS.2023.3293532
- Appears in Collections:
- KIST Article > 2023
- Files in This Item:
There are no files associated with this item.
- Export
- RIS (EndNote)
- XLS (Excel)
- XML
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.