KungfuBot enables humanoid robots to learn and execute highly-dynamic human skills like martial arts and dancing by integrating a physics-based motion processing pipeline with an adaptive motion tracking mechanism. This approach allows zero-shot transfer to real robots, demonstrating superior tracking performance with a global mean per body position error of 53.25mm on easy motions, and robustly executing complex maneuvers on a Unitree G1 robot.
View blogWenetSpeech-Yue introduces the largest open-source Cantonese speech corpus, containing over 21,800 hours of multi-dimensionally annotated audio, along with comprehensive evaluation benchmarks. This resource enables the development of state-of-the-art Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) models for Cantonese, significantly advancing speech technology for the dialect.
View blogUniModel introduces a visual-only framework that unifies multimodal understanding and generation by representing both text and images as pixel-level data within a single diffusion transformer. This approach enables coherent text-to-image generation and image captioning, demonstrating strong cycle consistency and emergent controllability by operating entirely in a shared visual latent space.
View blogPRODVA, developed by researchers at East China Normal University and collaborators, generates protein sequences that are both functionally aligned with text descriptions and structurally plausible. This method achieved 77% of designs with pLDDT > 70 and outperformed prior state-of-the-art models in foldability while utilizing less than 0.04% of their training data.
View blogThe paper introduces WenetSpeech-Chuan, the largest open-source corpus for Sichuanese dialects, containing over 10,000 hours of richly annotated audio. This resource, coupled with a systematic data processing pipeline, facilitates state-of-the-art Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) for Sichuanese, achieving performance competitive with commercial systems and significantly surpassing existing open-source models.
View blogResearchers introduced Reinforced Advantage (ReAd), a closed-loop framework integrating Multi-Agent Reinforcement Learning (MARL) advantage functions to provide principled feedback for Large Language Model (LLM) planning in embodied multi-agent tasks. This approach reduces environmental interactions and LLM queries, achieving superior task success rates and efficiency across various multi-robot and cooperative benchmarks.
View blog