University of Science and Technology of China logoUniversity of Science and Technology of China
This survey paper systematically synthesizes advancements in Reinforcement Learning (RL) for Large Reasoning Models (LRMs), moving beyond human alignment to focus on enhancing intrinsic reasoning capabilities through verifiable rewards. It identifies key components, challenges, and future directions for scaling RL towards Artificial SuperIntelligence (ASI).
1,595
A comprehensive survey formally defines Agentic Reinforcement Learning (RL) for Large Language Models (LLMs) as a Partially Observable Markov Decision Process (POMDP), distinct from conventional LLM-RL, and provides a two-tiered taxonomy of capabilities and task domains. The work consolidates open-source resources and outlines critical open challenges for the field.
3
Brain-computer interfaces surged extraordinary developments in recent years, and a significant discrepancy now exists between the abundance of available data and the limited headway made in achieving a unified theoretical framework. This discrepancy becomes particularly pronounced when examining the collective neural activity at the micro- and meso-scale, where a coherent formalization that adequately describes neural interactions is still lacking. Here, we introduce a mathematical framework to analyze systems of natural neurons and interpret the related empirical observations in terms of lattice field theory, an established paradigm from theoretical particle physics and statistical mechanics. Our methods are tailored to interpret data from chronic neural interfaces, especially spike rasters from measurements of single neurons activity, and generalize the maximum entropy model for neural networks so that also the time evolution of the system is taken into account. This is obtained by bridging particle physics and neuroscience, paving the way to particle physics-inspired models of neocortex.
Researchers from the University of Science and Technology of China and the Chinese Academy of Sciences developed a framework utilizing conditional deep generative models (cDGMs) to learn demand distributions influenced by price and contextual features. This approach enables robust, data-driven optimization of inventory and pricing decisions, demonstrating superior profitability and asymptotic optimality compared to traditional methods in both simulations and a real-world case study.
California Institute of Technology logoCalifornia Institute of TechnologyUniversity of OsloUniversity of Cambridge logoUniversity of CambridgeUniversity of VictoriaChinese Academy of Sciences logoChinese Academy of SciencesUniversity of ZurichTel Aviv University logoTel Aviv UniversityUniversity of Oxford logoUniversity of OxfordUniversity of Science and Technology of China logoUniversity of Science and Technology of ChinaScuola Normale SuperioreUniversity of Copenhagen logoUniversity of CopenhagenUniversity of EdinburghThe University of Texas at Austin logoThe University of Texas at AustinINFN logoINFNETH Zürich logoETH ZürichYonsei UniversityUniversity of CreteKavli Institute for the Physics and Mathematics of the UniverseUniversität HeidelbergUniversity of Maryland logoUniversity of MarylandUniversidad Autónoma de MadridUniversité Paris-Saclay logoUniversité Paris-SaclayStockholm University logoStockholm UniversityUniversity of HelsinkiUniversity of Arizona logoUniversity of ArizonaUniversity of Western AustraliaUniversity of SheffieldPrinceton University logoPrinceton UniversityUniversity of GenevaUniversity of PortsmouthUniversity of IcelandUniversità di GenovaUniversidade do PortoUniversity of SussexINAFAix Marseille UniversityNiels Bohr InstituteUniversity of JyväskyläUniversity of PadovaJet Propulsion LaboratoryJagiellonian UniversityInstituto de Astrofísica de CanariasUniversity of the WitwatersrandUniversity of NottinghamEuropean Space AgencyUniversity of Cape TownSISSANicolaus Copernicus Astronomical CenterObservatoire de la Côte d’AzurUniversity of Hawai’iUniversity of KwaZulu-NatalLudwig-Maximilians-UniversitätLaboratoire d’Astrophysique de MarseilleINAF-Istituto di RadioastronomiaINAF – Osservatorio Astronomico di RomaInstitut de Física d’Altes Energies (IFAE)Laboratoire de Physique des 2 Infinis Irène Joliot-CurieOsservatorio Astronomico della Regione Autonoma Valle d’AostaINAF - Osservatorio Astrofisico di CataniaINAF - Osservatorio Astronomico di ArcetriInstitut d’Astrophysique SpatialeNASADTU SpaceThe Queen’s University of BelfastInstituto de Astrofísica e Ciências do Espaço, Universidade de LisboaIRAP, Université de Toulouse, CNRS, CNESETH, Institute for AstronomyINAF-IASF, BolognaCosmic Dawn Center(DAWN)Universit degli Studi di FerraraUniversit de ParisUniversit Claude Bernard Lyon 1Excellence Cluster ‘Origins’Universit de LyonUniversit di PisaIFCA-CSIC-UCINAF Osservatorio Astronomico di PadovaUniversit degli Studi di FirenzeUniversit de MontpellierUniversit degli Studi di Napoli Federico IIUniversit di Roma Tor VergataINAF Osservatorio di Astrofisica e Scienza dello Spazio di BolognaUniversit Di BolognaINAF ` Osservatorio Astronomico di TriesteUniversit degli Studi di Trieste
Verifying the fully kinematic nature of the cosmic microwave background (CMB) dipole is of fundamental importance in cosmology. In the standard cosmological model with the Friedman-Lemaitre-Robertson-Walker (FLRW) metric from the inflationary expansion the CMB dipole should be entirely kinematic. Any non-kinematic CMB dipole component would thus reflect the preinflationary structure of spacetime probing the extent of the FLRW applicability. Cosmic backgrounds from galaxies after the matter-radiation decoupling, should have kinematic dipole component identical in velocity with the CMB kinematic dipole. Comparing the two can lead to isolating the CMB non-kinematic dipole. It was recently proposed that such measurement can be done using the near-IR cosmic infrared background (CIB) measured with the currently operating Euclid telescope, and later with Roman. The proposed method reconstructs the resolved CIB, the Integrated Galaxy Light (IGL), from Euclid's Wide Survey and probes its dipole, with a kinematic component amplified over that of the CMB by the Compton-Getting effect. The amplification coupled with the extensive galaxy samples forming the IGL would determine the CIB dipole with an overwhelming signal/noise, isolating its direction to sub-degree accuracy. We develop details of the method for Euclid's Wide Survey in 4 bands spanning 0.6 to 2 mic. We isolate the systematic and other uncertainties and present methodologies to minimize them, after confining the sample to the magnitude range with negligible IGL/CIB dipole from galaxy clustering. These include the required star-galaxy separation, accounting for the extinction correction dipole using the method newly developed here achieving total separation, accounting for the Earth's orbital motion and other systematic effects. (Abridged)
Researchers from ICT, Chinese Academy of Sciences, developed FedCache, a knowledge cache-driven federated learning architecture that facilitates personalized edge intelligence. It achieves performance comparable to state-of-the-art methods while reducing communication overhead by more than two orders of magnitude, notably being the first sample-grained logits interaction method without feature transmission or public datasets.
Understanding how the properties of galaxies relate to the properties of the hot circum-galactic medium (CGM) around them can constrain galaxy evolution models. We measured the X-ray luminosity of the hot CGM based on the surface brightness profiles of central galaxy samples measured from Spectrum Roentgen Gamma (SRG)/eROSITA all-sky survey data. We related the X-ray luminosity to the galaxies' stellar and halo mass, and we compared the observed relations to the self-similar model and intrinsic (i.e., not forward-modeled) output of the IllustrisTNG, EAGLE, and SIMBA simulations. The average hot CGM X-ray luminosity (LX,CGML_{\rm X,CGM}) correlates with the galaxy's stellar mass (MM_*). It increases from (1.6±2.1)×1039ergs1(1.6 \pm 2.1)\times10^{39} \rm erg\,s^{-1} to (3.4±0.3)×1041ergs1(3.4 \pm 0.3)\times10^{41} \rm erg\,s^{-1}, when log(M)\log(M_*) increases from 10.0 to 11.5. A power law describes the correlation as log(LX,CGM)=(2.4±0.1)×log(M)+(14.6±1.5)\log(L_{\rm X,CGM})= (2.4\pm 0.1)\times \log(M_*)+(14.6\pm1.5). The hot CGM X-ray luminosity as a function of halo mass is measured within log(M500c)=11.313.7\log(M_{\rm 500c})=11.3-13.7, extending our knowledge of the scaling relation by more than two orders of magnitude. LX,CGML_{\rm X,CGM} increases with M500cM_{\rm 500c} from (3.0±1.6)×1039 ergs1(3.0 \pm 1.6)\times10^{39}\ \rm erg\,s^{-1} at log(M500c)=11.3\log(M_{\rm 500c})=11.3 to (1.3±0.1)×1042 ergs1(1.3 \pm 0.1)\times10^{42}\ \rm erg\,s^{-1} at log(M500c)=13.7\log(M_{\rm 500c})=13.7. The relation follows a power law of log(LX,CGM)=(1.32±0.05)×log(M500c)+(24.1±0.7)\log(L_{\rm X,CGM})= (1.32\pm 0.05)\times \log(M_{\rm 500c})+(24.1\pm0.7). Our observations highlight the necessity of non-gravitational processes at the galaxy group scale while suggesting these processes are sub-dominant at the galaxy scale. We show that the outputs of current cosmological galaxy simulations generally align with the observational results uncovered here but with possibly important deviations in selected mass ranges.
The circumgalactic medium (CGM) provides the material needed for galaxy formation and influences galaxy evolution. The hot (T>10^6K) CGM is poorly detected around galaxies with stellar masses (MM_*) lower than 3×1011M3\times10^{11}M_\odot due to the low surface brightness. We used the X-ray data from the first four SRG/eROSITA All-Sky Surveys (eRASS:4). Based on the SDSS spectroscopic survey and halo-based group finder algorithm, we selected central galaxies with spectroscopic redshifts of z_{\rm spec}<0.2 and stellar masses of 10.0<\log(M_*/M_\odot)<11.5 (85,222 galaxies) -- or halo masses of 11.5<\log(M_{\rm 200m}/M_\odot)<14.0 (125,512 galaxies). By stacking the X-ray emission around galaxies, masking the detected X-ray point sources and carefully modeling the X-ray emission from the unresolved active galactic nuclei (AGN) and X-ray binaries (XRB), we obtain the X-ray emission from the hot CGM. We detected the X-ray emission around MW-mass and more massive central galaxies extending up to the virial radius (RvirR_{\rm vir}). We used a β\beta model to describe the X-ray surface brightness profile and found $\beta =0.43^{+0.10}_{-0.06}\,(0.37^{+0.04}_{-0.02})$ for MW-mass (M31-mass) galaxies.We estimated the baryon budget of the hot CGM and obtained a value that is lower than the prediction of Λ\LambdaCDM cosmology, indicating significant gas depletion in these halos. We extrapolated the hot CGM profile measured within RvirR_{\rm vir} to larger radii and found that within $\approx 3 R_{\rm vir},thebaryonbudgetisclosetothe, the baryon budget is close to the \Lambda$CDM cosmology prediction. Our results set a firm footing for the presence of the hot CGM around such galaxies. These measurements constitute a new benchmark for galaxy evolution models and possible implementations of feedback processes therein.
We identify a molecular bubble, and study the star formation and its feedback in the S Mon region, using multiple molecular lines, young stellar objects (YSOs), and infrared data. We revisit the distance to S Mon, ~722+/-9 pc, using Gaia Data Release 3 parallaxes of the associated Class II YSOs. The bubble may be mainly driven by a massive binary system (namely 15 Mon), the primary of which is an O7V-type star. An outflow is detected in the shell of the bubble, suggesting ongoing star formation activities in the vicinity of the bubble. The total wind energy of the massive binary star is three orders of magnitude higher than the sum of the observed turbulent energy in the molecular gas and the kinetic energy of the bubble, indicating that stellar winds help to maintain the turbulence in the S Mon region and drive the bubble. We conclude that the stellar winds of massive stars have an impact on their surrounding environment.
Thyme introduces a paradigm for multimodal large language models (MLLMs) to enhance reasoning and perception by autonomously generating and executing code for image manipulation and computation. This approach achieves substantial performance improvements across nearly 20 benchmarks, frequently outperforming larger models in high-resolution perception tasks.
500
AlphaEdit proposes a null-space constrained knowledge editing method for large language models, preventing catastrophic forgetting during sequential updates by projecting parameter perturbations to ensure preserved knowledge remains undisturbed. This approach leads to an average 36.7% improvement in editing capabilities while maintaining the model's general knowledge and fluency.
342
MemOS, a memory operating system for AI systems, redefines memory as a first-class system resource to address current Large Language Model limitations in long-context reasoning, continuous personalization, and knowledge evolution. This framework unifies heterogeneous memory types (plaintext, activation, parameter) using a standardized MemCube unit, achieving superior performance on benchmarks like LoCoMo and PreFEval, and demonstrating robust, low-latency memory operations.
2,562
Researchers at the University of Science and Technology of China and MetastoneTechnology introduce DeepResearch Bench, the first specialized benchmark for Deep Research Agents (DRAs), featuring 100 PhD-level tasks. The paper also proposes two novel, human-aligned evaluation frameworks: RACE for assessing report quality, and FACT for evaluating factual grounding and citation trustworthiness, demonstrating their effectiveness by evaluating leading DRAs and LLMs with search tools.
60
This survey provides a comprehensive synthesis of the rapidly evolving field of Multimodal Large Language Models, detailing their common architectures, multi-stage training paradigms, and diverse evaluation methodologies. It highlights the importance of data quality for instruction tuning and addresses key challenges, including the pervasive issue of multimodal hallucination.
281
Deformable DETR addresses the slow convergence and poor small object detection of the original DETR by introducing a deformable attention mechanism, which selectively samples a small number of keys from feature maps. This approach enables a 10x reduction in training epochs, improves small object detection performance, and achieves competitive results on the COCO dataset.
3,403
RoboTwin 2.0 introduces a scalable simulation framework and benchmark designed to generate high-quality, domain-randomized data for robust bimanual robotic manipulation, addressing limitations in existing synthetic datasets. Policies trained with RoboTwin 2.0 data achieved a 24.4% improvement in real-world success rates for few-shot learning and 21.0% for zero-shot generalization on unseen backgrounds.
1,514
An agentic system called Deep Video Discovery (DVD) was developed by Microsoft Research Asia to enhance understanding of hour-long videos by intelligently orchestrating multi-granular search tools. It achieved state-of-the-art accuracy on LVBench at 74.2%, a 13.4% absolute improvement, and surpassed human-level performance on EgoSchema with 76.6% accuracy.
282
SciReasoner, a scientific reasoning large language model, integrates diverse scientific data representations with natural language across multiple disciplines. The model achieved state-of-the-art performance on 54 scientific tasks and ranked among the top-2 on 101 tasks by employing a three-stage training framework that incorporates multi-representation scientific data.
48
Utilizing multi-band JWST observations, this research reveals that high-redshift submillimeter galaxies primarily form through secular evolution and internal processes rather than major mergers, uncovering a significant population of central stellar structures that do not conform to established local galaxy classifications.
ACEBench introduces a comprehensive benchmark for evaluating Large Language Model (LLM) tool-use capabilities across diverse scenarios, including normal function calls, imperfect instructions, and multi-turn agent interactions. The benchmark reveals that top-tier closed-source models lead in overall accuracy, while fine-tuned models struggle with generalization and robustness to imperfect inputs, with multi-turn agent tasks proving to be the most challenging for all evaluated LLMs.
134
There are no more papers matching your filters at the moment.