HalluciDet: Hallucinating RGB Modality for Person Detection Through Privileged Information

Heitor Medeiros

Fidel A. G. Pena

Masih Aminbeidokhti

Thomas Dubail

Eric Granger

Marco Pedersoli

WACV 2024 & LXAI@CVPR2024

[GitHub]

[Paper]

[Poster]

Abstract

A powerful way to adapt a visual recognition model to a new domain is through image translation. However, common image translation approaches only focus on generating data from the same distribution as the target domain. Given a cross-modal application, such as pedestrian detection from aerial images, with a considerable shift in data distribution between infrared (IR) to visible (RGB) images, a translation focused on generation might lead to poor performance as the loss focuses on irrelevant details for the task. In this paper, we propose HalluciDet, an IR-RGB image translation model for object detection. Instead of focusing on reconstructing the original image on the IR modality, it seeks to reduce the detection loss of an RGB detector, and therefore avoids the need to access RGB data. This model produces a new image representation that enhances objects of interest in the scene and greatly improves detection performance. We empirically compare our approach against state-of-the-art methods for image translation and for fine-tuning on IR, and show that our HalluciDet improves detection accuracy in most cases by exploiting the privileged information encoded in a pre-trained RGB detector.

Try our code

HalluciDet leverages privileged information for modality hallucination with pre-trained detectors. During training, the hallucination network learns how to use the privileged information encoded by the RGB detector to translate the IR image into a new hallucination modality representation. Then, during inference, the model provides better IR detection using the translated modality.

Example of detections using baseline and HalluciDet methods on LLVIP data. (a) Original RGB image with ground truth annotations (yellow). (b) IR image with corresponding detections of a fine-tuned model (green). (c) Translated image from IR to RGB produced by FastCUT and corresponding RGB detections (green). (d) Hallucinated image produced by our method and RGB detections (green); HalluciDet does not seek to reconstruct all image details but only to enhance the objects of interest.

Paper and Supplementary Material

Heitor Rapela Medeiros, Fidel A. Guerrero Pena, Masih Aminbeidokhti, Thomas Dubail, Eric Granger, Marco Pedersoli

HalluciDet: Hallucinating RGB Modality for Person Detection Through Privileged Information.
In WACV, 2024.

(hosted on WACV2024)

[Bibtex]

Experiments and Results

- Main Results Against I2I Methods.

- Qualitative Results

- Ablation: Reducing training samples.

Acknowledgements

This work was supported by Distech Controls Inc., the Natural Sciences and Engineering Research Council of Canada, and the Digital Research Alliance of Canada.