Semi-Supervised Instance Segmentation (SSIS) aims to leverage an amount of
unlabeled data during training. Previous frameworks primarily utilized the RGB
information of unlabeled images to generate pseudo-labels. However, such a
mechanism often introduces unstable noise, as a single instance can display
multiple RGB values. To overcome this limitation, we introduce a Depth-Guided
(DG) SSIS framework. This framework uses depth maps extracted from input
images, which represent individual instances with closely associated distance
values, offering precise contours for distinct instances. Unlike RGB data,
depth maps provide a unique perspective, making their integration into the SSIS
process complex. To this end, we propose Depth Feature Fusion, which integrates
features extracted from depth estimation. This integration allows the model to
understand depth information better and ensure its effective utilization.
Additionally, to manage the variability of depth images during training, we
introduce the Depth Controller. This component enables adaptive adjustments of
the depth map, enhancing convergence speed and dynamically balancing the loss
weights between RGB and depth maps. Extensive experiments conducted on the COCO
and Cityscapes datasets validate the efficacy of our proposed method. Our
approach establishes a new benchmark for SSIS, outperforming previous methods.
Specifically, our DG achieves 22.29%, 31.47%, and 35.14% mAP for 1%, 5%, and
10% labeled data on the COCO dataset, respectively.