China Mobile Zijin Innovation Institute
LiteVGGT introduces a geometry-aware cached token merging strategy to enhance the Visual Geometry Grounded Transformer (VGGT) for multi-view 3D reconstruction. This approach provides up to a 10x speedup in inference time and enables processing of 1000-image scenes without out-of-memory errors, while largely preserving geometric and pose estimation accuracy.
5
Establishing reliable correspondences between image pairs is a fundamental task in computer vision, underpinning applications such as 3D reconstruction and visual localization. Although recent methods have made progress in pruning outliers from dense correspondence sets, they often hypothesize consistent visual domains and overlook the challenges posed by diverse scene structures. In this paper, we propose CorrMoE, a novel correspondence pruning framework that enhances robustness under cross-domain and cross-scene variations. To address domain shift, we introduce a De-stylization Dual Branch, performing style mixing on both implicit and explicit graph features to mitigate the adverse influence of domain-specific representations. For scene diversity, we design a Bi-Fusion Mixture of Experts module that adaptively integrates multi-perspective features through linear-complexity attention and dynamic expert routing. Extensive experiments on benchmark datasets demonstrate that CorrMoE achieves superior accuracy and generalization compared to state-of-the-art methods. The code and pre-trained models are available at this https URL.
The stacked intelligent metasurface (SIM) emerges as an innovative technology with the ability to directly manipulate electromagnetic (EM) wave signals, drawing parallels to the operational principles of artificial neural networks (ANN). Leveraging its structure for direct EM signal processing alongside its low-power consumption, SIM holds promise for enhancing system performance within wireless communication systems. In this paper, we focus on SIM-assisted multi-user multi-input and single-output (MU-MISO) system downlink scenarios in the transmitter. We proposed a joint optimization method for SIM phase shift configuration and antenna power allocation based on the twin delayed deep deterministic policy gradient (TD3) algorithm to efficiently improve the sum rate. The results show that the proposed algorithm outperforms both deep deterministic policy gradient (DDPG) and alternating optimization (AO) algorithms. Furthermore, increasing the number of meta-atoms per layer of the SIM is always beneficial. However, continuously increasing the number of layers of SIM does not lead to sustained performance improvement.
There are no more papers matching your filters at the moment.