3D single object tracking in LiDAR point clouds (LiDAR SOT) plays a crucial
role in autonomous driving. Current approaches all follow the Siamese paradigm
based on appearance matching. However, LiDAR point clouds are usually
textureless and incomplete, which hinders effective appearance matching.
Besides, previous methods greatly overlook the critical motion clues among
targets. In this work, beyond 3D Siamese tracking, we introduce a
motion-centric paradigm to handle LiDAR SOT from a new perspective. Following
this paradigm, we propose a matching-free two-stage tracker M^2-Track. At the
1st-stage, M^2-Track localizes the target within successive frames via motion
transformation. Then it refines the target box through motion-assisted shape
completion at the 2nd-stage. Due to the motion-centric nature, our method shows
its impressive generalizability with limited training labels and provides good
differentiability for end-to-end cycle training. This inspires us to explore
semi-supervised LiDAR SOT by incorporating a pseudo-label-based motion
augmentation and a self-supervised loss term. Under the fully-supervised
setting, extensive experiments confirm that M^2-Track significantly outperforms
previous state-of-the-arts on three large-scale datasets while running at 57FPS
(~3%, ~11% and ~22% precision gains on KITTI, NuScenes, and Waymo Open Dataset
respectively). While under the semi-supervised setting, our method performs on
par with or even surpasses its fully-supervised counterpart using fewer than
half of the labels from KITTI. Further analysis verifies each component's
effectiveness and shows the motion-centric paradigm's promising potential for
auto-labeling and unsupervised domain adaptation.