TrackVLA++ advances embodied visual tracking by integrating an efficient spatial reasoning mechanism and a robust, confidence-gated long-term memory into Vision-Language-Action models. It achieves state-of-the-art performance on multiple simulation benchmarks and demonstrates improved real-world tracking robustness against occlusions and distractors.
View blogVideoSSM, developed by researchers at The University of Hong Kong and PICO, ByteDance, introduces a hybrid state-space memory architecture to enable autoregressive long video generation. The model maintains temporal consistency and dynamism over minute-scale durations, achieving superior quality and preventing motion drift or content repetition while operating with linear computational complexity.
View blogResearchers from Carnegie Mellon University, Microsoft Research Asia, and other institutions investigated how label noise in large-scale pre-training datasets affects downstream task performance. They developed NMTune, a lightweight, black-box fine-tuning method that consistently improves generalization, particularly for out-of-domain tasks, by adaptively reshaping feature representations.
View blogExisting Retrieval-Augmented Generation (RAG) systems often hallucinate by always attempting to answer questions, even when reliable information is unavailable. Sun et al. from the AlphaXiv Institute introduce Divide-Then-Align (DTA), a framework that teaches RAG models when to genuinely abstain by first categorizing queries based on the union of the LLM's parametric knowledge and retrieved information, then training with preference data tailored to these knowledge boundaries. DTA achieves an AF1 of 63.3% on Llama-2-7b, providing more reliable RAG responses that balance helpfulness with honesty.
View blogDidier Sornette's work redefines uncertainty as largely remediable ignorance rather than intrinsic randomness, advocating for a shift from forecasting exact outcomes to diagnosing instability by identifying early-warning signals in complex systems. His research demonstrates that extreme events, often called "dragon-kings," can exhibit precursory patterns due to self-amplifying mechanisms.
View blog