Depth estimation is critical in autonomous driving for interpreting 3D scenes
accurately. Recently, radar-camera depth estimation has become of sufficient
interest due to the robustness and low-cost properties of radar. Thus, this
paper introduces a two-stage, end-to-end trainable Confidence-aware Fusion Net
(CaFNet) for dense depth estimation, combining RGB imagery with sparse and
noisy radar point cloud data. The first stage addresses radar-specific
challenges, such as ambiguous elevation and noisy measurements, by predicting a
radar confidence map and a preliminary coarse depth map. A novel approach is
presented for generating the ground truth for the confidence map, which
involves associating each radar point with its corresponding object to identify
potential projection surfaces. These maps, together with the initial radar
input, are processed by a second encoder. For the final depth estimation, we
innovate a confidence-aware gated fusion mechanism to integrate radar and image
features effectively, thereby enhancing the reliability of the depth map by
filtering out radar noise. Our methodology, evaluated on the nuScenes dataset,
demonstrates superior performance, improving upon the current leading model by
3.2% in Mean Absolute Error (MAE) and 2.7% in Root Mean Square Error (RMSE).
Code: this https URL