Chain-of-Action (CoA) proposes a visuo-motor policy that generates robot trajectories autoregressively in reverse, starting from a task goal and reasoning backward to the current state. This approach addresses compounding errors and enhances spatial generalization, achieving an average success rate of 0.552 on 60 RLBench tasks and demonstrating improved performance on real-world Fetch robot manipulation.
View blogResearchers developed game-theory-inspired workflows to systematically enhance the strategic decision-making capabilities of large language models (LLMs) in various negotiation and strategic games. Integrating classical game theory principles, these workflows enabled LLM agents to achieve near-optimal allocations in incomplete-information negotiations, with up to 100% agreement and envy-freeness, and significantly improved adherence to Nash Equilibria in complete-information games compared to baseline LLM performance.
View blogResearchers at CAS Key Laboratory of AI Safety developed a theoretical framework to quantify the benefit and detriment of retrieved information in Retrieval-Augmented Generation (RAG) at the token level. This framework formalizes benefit as distribution completion and detriment as distribution contradiction, enabling a practical method (Tok-RAG) that improves RAG robustness and performance across diverse tasks with minimal computational overhead.
View blogThis work introduces UncertaintyRAG, a lightweight and unsupervised retrieval model for long-context Retrieval-Augmented Generation (RAG). It leverages Signal-to-Noise Ratio (SNR)-based span uncertainty to estimate semantic similarity between text chunks, enhancing robustness to distribution shifts and achieving state-of-the-art average performance on long-context QA and summarization benchmarks while utilizing only 4% of the training data compared to baseline models.
View blogThe VL-SAE framework introduces a Sparse Autoencoder architecture that interprets and enhances vision-language alignment in Vision-Language Models by mapping both modalities to a unified concept set. This approach improves the interpretability of cross-modal reasoning and demonstrates performance gains in zero-shot classification and hallucination reduction.
View blogResearchers from the Chinese Academy of Sciences and King Abdullah University of Science and Technology introduced GGFlow, the first discrete flow matching generative model that incorporates optimal transport for molecular graphs. This model achieves nearly perfect chemical validity and state-of-the-art performance in both unconditional and property-guided molecule generation with significantly fewer inference steps.
View blogA method for music style transfer is introduced that leverages diffusion models with time-varying textual inversion, allowing users to transfer styles from any audio example, including non-musical sounds, to existing melodies while preserving structural content. This approach demonstrates superior performance in both content preservation and style fit compared to existing state-of-the-art techniques.
View blogResearchers from CAS Key Laboratory of AI Security and Kuaishou Technology propose TEA (Test-time Energy Adaptation), a novel method that reinterprets a pre-trained classifier as an energy-based model to address distribution shifts. This approach directly aligns the model's perception of the data distribution with the incoming test data, leading to state-of-the-art generalization performance and improved confidence calibration across various image corruption and domain generalization benchmarks.
View blogKnowCoder, developed by ICT, CAS, enhances Universal Information Extraction (UIE) by introducing a code-style schema representation that leverages LLMs' inherent code understanding. The model demonstrates superior generalization across diverse information extraction tasks, achieving a 12.5% relative improvement in zero-shot NER F1 over leading baselines and outperforming prior state-of-the-art models in Relation and Event Extraction after fine-tuning.
View blogDIVERSIFY is a framework that tackles out-of-distribution detection and generalization for time series data by explicitly identifying and characterizing latent distributions without relying on predefined domain labels. It consistently outperforms baseline methods on OOD detection across seven diverse datasets, demonstrating its ability to learn robust representations for non-stationary time series.
View blog