Leap Icon

LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using LiDAR and Camera

1Zhejiang University
, 2Shanghai Artificial Intelligence Laboratory
, 3Technical University of Munich
Code (Updated) arXiv

Abstract

Semantic Scene Completion is pivotal in autonomous driving perception, frequently confronted with the complexities of weather and illumination changes. The long-term strategy involves fusing multi-modal information to bolster the system's robustness. Radar, increasingly utilized for 3D target detection, is gradually replacing LiDAR in autonomous driving applications, offering a robust sensing alternative. In this paper, we focus on the potential of 3D radar in semantic scene completion, pioneering cross-modal distillation to achieve balanced performance across all aspects. In terms of model architecture, we build upon our radar-based baseline and propose a three-stage tight fusion approach on BEV to realize a fusion framework for point clouds and images. On this basis, we design three cross-modal distillation modules (CMRD, BRD, and PDD) to supplement the rich semantic and structural information of the fusion features of LiDAR-camera into the two settings of radar-only and radar-camera, respectively, to obtain our R-LiCROcc and RC-LiCROcc. Finally, our LC-Fusion (teacher model), R-LiCROcc and RC-LiCROcc achieve the best performance on the nuScenes-Occupancy dataset, with mIOU exceeding the baseline by 22.9%, 44.1%, and 15.5%, respectively.

Method

Experiments

Leap Icon
Leap Icon
Leap Icon
Leap Icon

Comparison

BibTeX


        @ARTICLE{10777549,
          author={Ma, Yukai and Mei, Jianbiao and Yang, Xuemeng and Wen, Licheng and Xu, Weihua and Zhang, Jiangning and Zuo, Xingxing and Shi, Botian and Liu, Yong},
          journal={IEEE Robotics and Automation Letters}, 
          title={LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction Using LiDAR and Camera}, 
          year={2025},
          volume={10},
          number={1},
          pages={852-859},
          keywords={Radar;Semantics;Radar imaging;Three-dimensional displays;Laser radar;Feature extraction;Cameras;Sensors;Meteorology;Point cloud compression;Sensor fusion;semantic scene completion;knowledge distillation},
          doi={10.1109/LRA.2024.3511427}}