Semantic Scene Completion is pivotal in autonomous driving perception, frequently confronted with the complexities of weather and illumination changes. The long-term strategy involves fusing multi-modal information to bolster the system's robustness. Radar, increasingly utilized for 3D target detection, is gradually replacing LiDAR in autonomous driving applications, offering a robust sensing alternative. In this paper, we focus on the potential of 3D radar in semantic scene completion, pioneering cross-modal distillation to achieve balanced performance across all aspects. In terms of model architecture, we build upon our radar-based baseline and propose a three-stage tight fusion approach on BEV to realize a fusion framework for point clouds and images. On this basis, we design three cross-modal distillation modules (CMRD, BRD, and PDD) to supplement the rich semantic and structural information of the fusion features of LiDAR-camera into the two settings of radar-only and radar-camera, respectively, to obtain our R-LiCROcc and RC-LiCROcc. Finally, our LC-Fusion (teacher model), R-LiCROcc and RC-LiCROcc achieve the best performance on the nuScenes-Occupancy dataset, with mIOU exceeding the baseline by 22.9%, 44.1%, and 15.5%, respectively.
@ARTICLE{10777549,
author={Ma, Yukai and Mei, Jianbiao and Yang, Xuemeng and Wen, Licheng and Xu, Weihua and Zhang, Jiangning and Zuo, Xingxing and Shi, Botian and Liu, Yong},
journal={IEEE Robotics and Automation Letters},
title={LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction Using LiDAR and Camera},
year={2025},
volume={10},
number={1},
pages={852-859},
keywords={Radar;Semantics;Radar imaging;Three-dimensional displays;Laser radar;Feature extraction;Cameras;Sensors;Meteorology;Point cloud compression;Sensor fusion;semantic scene completion;knowledge distillation},
doi={10.1109/LRA.2024.3511427}}