Novosibirsk State University
In heterogeneous multi-task decision-making, tasks not only exhibit diverse observation and action spaces but also vary substantially in their underlying complexities. While conventional multi-task world models like UniZero excel in single-task settings, we find that when handling a broad and diverse suite of tasks, gradient conflicts and the loss of model plasticity often constrain their sample efficiency. In this work, we address these challenges from two complementary perspectives: the single learning iteration and the overall learning process. First, to mitigate the gradient conflicts, we systematically investigate key architectural designs for extending UniZero. Our investigation identifies a Mixture-of-Experts (MoE) architecture as the most effective approach. We demonstrate, both theoretically and empirically, that this architecture alleviates gradient conflicts by routing task-specific representations to specialized sub-networks. This finding leads to our proposed model, \textit{ScaleZero}. Second, to dynamically allocate model capacity throughout the learning process, we introduce an online Dynamic Parameter Scaling (DPS) strategy. This strategy progressively integrates LoRA adapters in response to task-specific progress, enabling adaptive knowledge retention and parameter expansion. Evaluations on a diverse set of standard benchmarks (Atari, DMC, Jericho) demonstrate that ScaleZero, utilizing solely online reinforcement learning with one model, performs on par with specialized single-task agents. With the DPS strategy, it remains competitive while using just 71.5% of the environment interactions. These findings underscore the potential of ScaleZero for effective multi-task planning. Our code is available at \textcolor{magenta}{this https URL}.
Large Language Models (LLMs) have exhibited impressive capabilities across numerous domains, yet they often struggle with complex reasoning and decision-making tasks. Decision-making games, which inherently require multifaceted reasoning logic, serve as ideal sandboxes for evaluating and enhancing the reasoning abilities of LLMs. In this work, we first explore whether LLMs can master complex decision-making games through targeted post-training. To this end, we design data synthesis strategies and curate extensive offline datasets from two classic games, Doudizhu and Go. We further develop a suite of techniques to effectively incorporate this data into LLM training, resulting in two novel agents: Mastermind-Dou and Mastermind-Go. Our experimental results demonstrate that these Mastermind LLMs achieve competitive performance in their respective games. Additionally, we explore whether integrating decision-making data can enhance the general reasoning abilities of LLMs. Our findings suggest that such post-training improves certain aspects of reasoning, providing valuable insights for optimizing LLM data collection and synthesis strategies.
Wuhan University of TechnologyWuhan UniversityChinese Academy of Sciences logoChinese Academy of SciencesCarnegie Mellon University logoCarnegie Mellon UniversityBudker Institute of Nuclear Physics SB RASSichuan UniversityGyeongsang National UniversityFudan University logoFudan UniversityUniversity of Science and Technology of China logoUniversity of Science and Technology of ChinaBeihang University logoBeihang UniversityShanghai Jiao Tong University logoShanghai Jiao Tong UniversityNanjing University logoNanjing UniversityHunan Normal UniversityGuangzhou UniversityCentral South UniversityNankai UniversityBeijing Jiaotong University logoBeijing Jiaotong UniversityPeking University logoPeking UniversityJoint Institute for Nuclear ResearchUniversity of Minnesota logoUniversity of MinnesotaSouth China Normal UniversitySouthwest UniversityAnhui UniversityPurdue University logoPurdue UniversityUppsala UniversityUniversity of LiverpoolGuangxi Normal UniversityJilin UniversityUniversity of SheffieldCentral China Normal UniversitySouthern University of Science and Technology logoSouthern University of Science and TechnologyShandong University logoShandong UniversityNovosibirsk State UniversityYunnan UniversityLanzhou UniversityNorthwest UniversityIndian Institute of Technology MadrasEast China Normal UniversityUniversity of South ChinaUniversity of JinanUniversity of Groningen logoUniversity of GroningenNanjing Normal UniversityYantai UniversityGuangxi UniversityGSI Helmholtzzentrum fuer Schwerionenforschung GmbHFuzhou UniversitySuranaree University of TechnologyINFN, Sezione di TorinoAkdeniz UniversityLinyi UniversityINFN, Laboratori Nazionali di FrascatiShandong Institute of Advanced TechnologyHenan Normal UniversityUniversit`a di TorinoNational Centre for Nuclear ResearchInstitute of Nuclear Physics, Polish Academy of SciencesUniversity of the PunjabShandong Normal UniversityYunnan Normal UniversityLiaoning Normal UniversityChina University of Geosciences (Wuhan)University of Science and Technology LiaoningHelmholtz-Institut MainzBeijing Institute of Petrochemical TechnologyP.N. Lebedev Physical Institute of the Russian Academy of SciencesLiaocheng UniversityJustus-Liebig-Universitaet GiessenUniversitaet Duisburg-EssenJohannes Gutenberg-Universitaet MainzShaanxi Key Laboratory of Quantum Information and Quantum Optoelectronic DevicesRuhr Universitaet BochumState Key Laboratory of Particle Detection and Electronics, USTCUniversità di FerraraINFN-Sezione di Ferrara
Based on the (2712.4±14.4)×106(2712.4\pm14.4)\times 10^{6} ψ(3686)\psi(3686) events collected with the BESIII detector, we present a high-precision study of the π+π\pi^+\pi^- mass spectrum in ψ(3686)π+πJ/ψ\psi(3686)\rightarrow\pi^{+}\pi^{-}J/\psi decays. A clear resonance-like structure is observed near the π+π\pi^+\pi^- mass threshold for the first time. A fit with a Breit-Wigner function yields a mass of 285.6±2.5 MeV/c2285.6\pm 2.5~{\rm MeV}/c^2 and a width of 16.3±0.9 MeV16.3\pm 0.9~{\rm MeV} with a statistical significance exceeding 10σ\sigma. To interpret the data, we incorporate final-state interactions (FSI) within two theoretical frameworks: chiral perturbation theory (ChPT) and QCD multipole expansion (QCDME). ChPT describes the spectrum above 0.3 GeV/c2c^2 but fails to reproduce the threshold enhancement. In contrast, the QCDME model, assuming the ψ(3686)\psi(3686) is an admixture of S- and D-wave charmonium, reproduces the data well. The pronounced dip near 0.3 GeV/c2c^2 offers new insight into the interplay between chiral dynamics and low-energy QCD.
Tsanda Alena and Bruches Elena developed the first publicly available multimodal dataset for Russian scientific paper summarization, featuring 420 papers across diverse domains with text, figures, and tables. Benchmarking against Russian large language models established baseline performance and highlighted the challenges in incorporating multimodal information for summarization.
To plan a safe and efficient route, an autonomous vehicle should anticipate future motions of other agents around it. Motion prediction is an extremely challenging task that recently gained significant attention within the research community. In this work, we present a simple and yet very strong baseline for multimodal motion prediction based purely on Convolutional Neural Networks. While being easy-to-implement, the proposed approach achieves competitive performance compared to the state-of-the-art methods and ranks 3rd on the 2021 Waymo Open Dataset Motion Prediction Challenge. Our source code is publicly available at GitHub
34
Multi-class segmentation of the aorta in computed tomography angiography (CTA) scans is essential for diagnosing and planning complex endovascular treatments for patients with aortic dissections. However, existing methods reduce aortic segmentation to a binary problem, limiting their ability to measure diameters across different branches and zones. Furthermore, no open-source dataset is currently available to support the development of multi-class aortic segmentation methods. To address this gap, we organized the AortaSeg24 MICCAI Challenge, introducing the first dataset of 100 CTA volumes annotated for 23 clinically relevant aortic branches and zones. This dataset was designed to facilitate both model development and validation. The challenge attracted 121 teams worldwide, with participants leveraging state-of-the-art frameworks such as nnU-Net and exploring novel techniques, including cascaded models, data augmentation strategies, and custom loss functions. We evaluated the submitted algorithms using the Dice Similarity Coefficient (DSC) and Normalized Surface Distance (NSD), highlighting the approaches adopted by the top five performing teams. This paper presents the challenge design, dataset details, evaluation metrics, and an in-depth analysis of the top-performing algorithms. The annotated dataset, evaluation code, and implementations of the leading methods are publicly available to support further research. All resources can be accessed at this https URL
Generation of cosmic microwave background (CMB) elliptic polarization due to the Cotton-Mouton (CM) effect in a cosmic magnetic field is studied. We concentrate on the generation of CMB circular polarization and on the rotation angle of the CMB polarization plane from the decoupling time until at present. For the first time, a rather detailed analysis of the CM effect for an arbitrary direction of the cosmic magnetic field with respect to photon direction of propagation is done. Considering the CMB linearly polarized at the decoupling time, it is shown that the CM effect is one of the most substantial effects in generating circular polarization especially in the low part of the CMB spectrum. It is shown that in the frequency range 10810^8 Hz ν0109\leq \nu_0\leq 10^9 Hz, the degree of circular polarization of the CMB at present for perpendicular propagation with respect to the cosmic magnetic field is in the range 1013PC(t0)7.65×107 10^{-13}\lesssim P_C(t_0)\lesssim 7.65\times 10^{-7} or Stokes circular polarization parameter 2.7×10132.7 \times 10^{-13} K V(t0)2×106\lesssim |V(t_0)|\lesssim 2 \times 10^{-6} K for values of the cosmic magnetic field amplitude at present in the range 10910^{-9} G B8×108\lesssim B\lesssim 8\times 10^{-8} G. On the other hand, for not perpendicular propagation with respect to the cosmic magnetic field we find 1015PC(t0)6×101210^{-15}\lesssim P_C(t_0)\lesssim 6\times 10^{-12} or 2.72×10152.72 \times 10^{-15} K V(t0)1011\lesssim |V(t_0)| \lesssim 10^{-11} K, for the same values of the cosmic magnetic field amplitude and same frequency range. Estimates on the rotation angle of the CMB polarization plane δψ0\delta\psi_0 due to the CM effect and constraints on the cosmic magnetic field amplitude from current constraints on δψ0\delta\psi_0 due to a combination of the CM and Faraday effects are found.
We extend the scheme of neutral atom Rydberg CZC_Z gate based on double sequence of adiabatic pulses applied symmetrically to both atoms using counterdiabatic driving in the regime of Rydberg blockade. This provides substantial reducing of quantum gate operation times (at least five times) compared to previously proposed adiabatic schemes, which is important for high-fidelity entanglement due to finite Rydberg lifetimes. We analyzed schemes of adiabatic rapid passage with counterdiabatic driving for single-photon, two-photon and three-photon schemes of Rydberg excitation for rubidium and cesium atoms. We designed laser pulse profiles with fully analytical shapes and calculated the Bell fidelity taking into account atomic lifetimes and finite blockade strengths. We show that the upper limit of the Bell fidelity reaches F0.9999{\mathcal F}\simeq0.9999 in a room-temperature environment.
Light bosonic (axion-like) dark matter may form Bose stars - clumps of nonrelativistic Bose-Einstein condensate supported by self-gravity. We study rotating Bose stars composed of condensed particles with nonzero angular momentum ll. We analytically prove that these objects are unstable at arbitrary l0l \ne 0 if particle self-interactions are attractive or negligibly small. They decay by shedding off the particles and transporting the angular momentum to the periphery of the system until a Saturn-like configuration appears: one (or several) spin-zero Bose stars and clouds of diffuse particles orbit around the mutual center. In the case of no self-interactions we calculate the profiles and dominant instability modes of the rotating stars: numerically at 1l151 \leq l\leq 15 and analytically at l1l\gg 1. Notably, their lifetimes are always comparable to the inverse binding energies; hence, these objects cannot be considered long-living. Finally, we numerically show that in models with sufficiently strong repulsive self-interactions the Bose star with l=1l=1 is stable.
We show the possibility of implementing a deep dissipative optical lattice for neutral atoms with a macroscopic period. The depth of the lattice can reach magnitudes comparable to the depth of the magneto-optical traps (MOT), while the presence of dissipative friction forces allows for trapping and cooling of atoms. The area of localization of trapped atoms reaches sub-millimeter size, and the number of atoms is comparable to the number trapped in MOT. As an example, we study lithium atoms for which the macroscopic period of the lattice Λ=1.5\Lambda=1.5 cm. Such deep optical lattices with a macroscopic period open up possibility for developing effective methods for cooling and trapping neutral atoms without use of magnetic field as an alternative to MOT. This is important for developing compact systems based on cold atoms.
The recent precise measurements of the e+eKSKLe^+e^-\to K_SK_L and $e^+e^-\to K^+K^-crosssectionsandthehadronicspectralfunctionofthe cross sections and the hadronic spectral function of the \tau^-\to K^-K_S\nu_\tau$ decay are used to extract the isoscalar and isovector electromagnetic kaon form factors and their relative phase in a model independent way. The experimental results are compared with a fit based on the vector-meson-dominance model.
We study e+e- --> pi+pi-h_c at center-of-mass energies from 3.90 GeV to 4.42 GeV using data samples collected with the BESIII detector operating at the Beijing Electron Positron Collider. The Born cross sections are measured at 13 energies, and are found to be of the same order of magnitude as those of e+e- --> pi+pi-J/psi but with a different line shape. In the \pi^\pm h_c mass spectrum, a distinct structure, referred to as Z_c(4020), is observed at 4.02 GeV/c^2. The Z_c(4020) carries an electric charge and couples to charmonium. A fit to the \pi^\pm h_c invariant mass spectrum, neglecting possible interferences, results in a mass of (4022.9\pm 0.8\pm 2.7) MeV/c^2 and a width of (7.9\pm 2.7\pm 2.6) MeV for the Z_c(4020), where the first errors are statistical and the second systematic. No significant Z_c(3900) signal is observed, and upper limits on the Z_c(3900) production cross sections in \pi^\pm h_c at center-of-mass energies of 4.23 and 4.26 GeV are set.
Wuhan UniversityChinese Academy of Sciences logoChinese Academy of SciencesSichuan UniversitySun Yat-Sen University logoSun Yat-Sen UniversityNanjing University of Aeronautics and AstronauticsFudan University logoFudan UniversityUniversity of Science and Technology of China logoUniversity of Science and Technology of ChinaShanghai Jiao Tong University logoShanghai Jiao Tong UniversityNanjing University logoNanjing UniversityUniversity of BonnPanjab UniversityNankai UniversityUniversity of California, San Diego logoUniversity of California, San DiegoPeking University logoPeking UniversityJoint Institute for Nuclear ResearchRoyal Institute of TechnologyUniversity of TurinUniversity of BolognaGuangxi Normal UniversityJilin UniversityUniversity of HoustonUniversity of Science and Technology BeijingCentral China Normal UniversityShandong University logoShandong UniversityNovosibirsk State UniversityUniversity of ViennaYunnan UniversityLanzhou UniversityUniversity of FerraraIndian Institute of Technology MadrasSoochow UniversityUniversity of South ChinaUniversity of JinanHunan UniversityUniversity of Virginia logoUniversity of VirginiaUniversity of Groningen logoUniversity of GroningenNanjing Normal UniversityGuangxi UniversityFuzhou UniversityInner Mongolia UniversityZhengzhou UniversityXian Jiaotong UniversityJohannes Gutenberg University MainzShandong Institute of Advanced TechnologyHenan Normal UniversityIndian Institute of Technology IndoreNational Centre for Nuclear ResearchHubei UniversityJustus Liebig University GiessenUniversity of HyderabadGSI Helmholtzzentrum für Schwerionenforschung GmbHUniversity of the PunjabG.I. Budker Institute of Nuclear Physics SB RASZhongkai University of Agriculture and EngineeringHelmholtz-Institut MainzIstituto Nazionale di Fisica Nucleare, Sezione di BolognaCOMSATS University Islamabad, Lahore CampusIstituto Nazionale di Fisica Nucleare, Sezione di FerraraP.J.  Safarik UniversityLudwigs-Maximilians-University MunichP. A. M. Dirac Center for Advanced and Interdisciplinary StudiesK. K. PolytechnicChina Normal UniversityIstituto Nazionale di Fisica Nucleare Sezione di Torino
Using about 23 fb1\mathrm{fb^{-1}} of data collected with the BESIII detector operating at the BEPCII storage ring, a precise measurement of the e+eπ+πJ/ψe^{+}e^{-} \rightarrow \pi^{+}\pi^{-}J/\psi Born cross section is performed at center-of-mass energies from 3.7730 to 4.7008 GeV. Two structures, identified as the Y(4220)Y(4220) and the Y(4320)Y(4320) states, are observed in the energy-dependent cross section with a significance larger than 10σ10\sigma. The masses and widths of the two structures are determined to be (M,ΓM, \Gamma) = (4221.4±1.5±2.04221.4\pm1.5\pm2.0 MeV/c2c^{2}, 41.8±2.9±2.741.8\pm2.9\pm2.7 MeV) and (M,ΓM, \Gamma) = (4298±12±264298\pm12\pm26 MeV/c2c^{2}, 127±17±10127\pm17\pm10 MeV), respectively. A small enhancement around 4.5 GeV with a significance about 3σ3\sigma, compatible with the ψ(4415)\psi(4415), might also indicate the presence of an additional resonance in the spectrum. The inclusion of this additional contribution in the fit to the cross section affects the resonance parameters of the Y(4320)Y(4320) state.
Academia SinicaWuhan UniversityKyungpook National UniversityChinese Academy of Sciences logoChinese Academy of SciencesBudker Institute of Nuclear Physics SB RASBeijing Normal University logoBeijing Normal UniversityUniversity of Oxford logoUniversity of OxfordFudan University logoFudan UniversityUniversity of Science and Technology of China logoUniversity of Science and Technology of ChinaShanghai Jiao Tong University logoShanghai Jiao Tong UniversityNanjing University logoNanjing UniversityZhejiang University logoZhejiang UniversityUniversity of Bristol logoUniversity of BristolNankai UniversityPeking University logoPeking UniversityJoint Institute for Nuclear ResearchXiamen UniversityNanchang UniversityHuazhong University of Science and Technology logoHuazhong University of Science and TechnologyAnhui UniversityChongqing UniversityUniversité Paris-Saclay logoUniversité Paris-SaclaySoutheast UniversityJilin UniversityCentral China Normal UniversityShandong University logoShandong UniversityNovosibirsk State UniversityYunnan UniversityInstitute for Basic ScienceLanzhou UniversitySoochow UniversityEast China Normal UniversityUniversity of South ChinaUniversity of JinanHunan UniversityNanjing Normal UniversityGuangxi UniversityCapital Normal UniversityRuhr-Universität BochumInner Mongolia UniversityZhengzhou UniversityXian Jiaotong UniversityJohannes Gutenberg University MainzGuilin University of Electronic TechnologyCNRS/IN2P3Guizhou Normal UniversityNational Research Nuclear University MEPhI (Moscow Engineering Physics Institute)Liaoning Normal UniversityUniversity of Science and Technology LiaoningNovosibirsk State Technical UniversityUniversity of La LagunaP.N. Lebedev Physical Institute of the Russian Academy of SciencesUniversity of LancasterLiaocheng UniversityInstitute for Theoretical and Experimental Physics named by A.I. Alikhanov of NRC “Kurchatov Institute”
The Super τ\tau-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of 0.5×1035cm2s10.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1} or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present τ\tau-Charm factory -- the BEPCII, providing a unique platform for exploring the asymmetry of matter-antimatter (charge-parity violation), in-depth studies of the internal structure of hadrons and the nature of non-perturbative strong interactions, as well as searching for exotic hadrons and physics beyond the Standard Model. The STCF project in China is under development with an extensive R\&D program. This document presents the physics opportunities at the STCF, describes conceptual designs of the STCF detector system, and discusses future plans for detector R\&D and physics case studies.
Keyphrase selection plays a pivotal role within the domain of scholarly texts, facilitating efficient information retrieval, summarization, and indexing. In this work, we explored how to apply fine-tuned generative transformer-based models to the specific task of keyphrase selection within Russian scientific texts. We experimented with four distinct generative models, such as ruT5, ruGPT, mT5, and mBART, and evaluated their performance in both in-domain and cross-domain settings. The experiments were conducted on the texts of Russian scientific abstracts from four domains: mathematics & computer science, history, medicine, and linguistics. The use of generative models, namely mBART, led to gains in in-domain performance (up to 4.9% in BERTScore, 9.0% in ROUGE-1, and 12.2% in F1-score) over three keyphrase extraction baselines for the Russian language. Although the results for cross-domain usage were significantly lower, they still demonstrated the capability to surpass baseline performances in several cases, underscoring the promising potential for further exploration and refinement in this research field.
The spin and parity of the Zc(3900)±Z_c(3900)^\pm state are determined to be JP=1+J^P=1^+ with a statistical significance larger than 7σ7\sigma over other quantum numbers in a partial wave analysis of the process e+eπ+πJ/ψe^+e^-\to \pi^+\pi^-J/\psi. We use a data sample of 1.92 fb1^{-1} accumulated at s=4.23\sqrt{s}=4.23 and 4.26 GeV with the BESIII experiment. When parameterizing the Zc(3900)±Z_c(3900)^\pm with a Flatte-like formula, we determine its pole mass Mpole=(3881.2±4.2stat±52.7syst)MeV/c2M_\textrm{pole}=(3881.2\pm4.2_\textrm{stat}\pm52.7_\textrm{syst})\textrm{MeV}/c^2 and pole width Γpole=(51.8±4.6stat±36.0syst)MeV\Gamma_\textrm{pole}=(51.8\pm4.6_\textrm{stat}\pm36.0_\textrm{syst})\textrm{MeV}. We also measure cross sections for the process e+eZc(3900)+π+c.c.J/ψπ+πe^+e^-\to Z_c(3900)^+\pi^-+c.c.\to J/\psi\pi^+\pi^- and determine an upper limit at the 90\% confidence level for the process e+eZc(4020)+π+c.c.J/ψπ+πe^+e^-\to Z_c(4020)^+\pi^-+c.c.\to J/\psi\pi^+\pi^-.
The genetic product of the groupoids, originating in the theory of DNA recombination, is introduced. It permits a natural generalization of the classical genetic algorithm. The full characterization of all three-element genetic groupoids gives an approach to construct the new classes of genetic algorithms. In the conclusion, we formulate some open problems in the theory of the genetic groupoids.
A precise measurement of the cross section for the process e+e- --> K+K-(gamma) from threshold to an energy of 5 GeV is obtained with the initial-state radiation (ISR) method using 232 fb^{-1} of data collected with the BaBar detector at e+e- center-of-mass energies near 10.6 GeV. The measurement uses the effective ISR luminosity determined from the e+e- --> mu+mu-(gamma)gamma_ISR process with the same data set. The corresponding lowest-order contribution to the hadronic vacuum polarization term in the muon magnetic anomaly is found to be a_mu^{KK, LO}=(22.93 +- 0.18_{stat} +- 0.22_{syst}) * 10^{-10}. The charged kaon form factor is extracted and compared to previous results. Its magnitude at large energy significantly exceeds the asymptotic QCD prediction, while the measured slope is consistent with the prediction.
Vapor condensation is a physical phenomenon that finds application in heat removal systems. The traditional design of these systems involves round tubes but experience shows that this geometry is not optimal for heat transfer. Flattened tubes on the other hand, have been found to offer potential for improvement as their geometry increases the condensation surface, which fosters higher heat transfer rates. However, the effects of tube shape (aspect ratio) and orientation (rotation angle) on film-wise condensation dynamics are not fully understood. In this work, we numerically simulate a model of the condensed vapor layer thickness distribution on the flattened tube inner surfaces taking into account bulk and surface forces (gravity, surface tension, shear stress) for a thin layer of liquid. We consider various configurations of aspect ratios (circular, and AR = 2, 4, and 6) and rotation angles (0°, 10°, 20°, 30°, 45°, 60°, 75°, and 90°). Our simulations allow for an improved understanding of how these geometric parameters as well as their interplay, influence the thickness distribution of the condensate film on the tube's inner surface, and facilitate the identification of configurations that maximize heat transfer efficiency. Considering water as a working fluid, results show a possible heat transfer enhancement of up to 74% compared to the round tube geometry for an aspect ratio of 6 and a rotation angle of 90°.
Credit assignment is a fundamental problem in reinforcement learning, the problem of measuring an action's influence on future rewards. Explicit credit assignment methods have the potential to boost the performance of RL algorithms on many tasks, but thus far remain impractical for general use. Recently, a family of methods called Hindsight Credit Assignment (HCA) was proposed, which explicitly assign credit to actions in hindsight based on the probability of the action having led to an observed outcome. This approach has appealing properties, but remains a largely theoretical idea applicable to a limited set of tabular RL tasks. Moreover, it is unclear how to extend HCA to deep RL environments. In this work, we explore the use of HCA-style credit in a deep RL context. We first describe the limitations of existing HCA algorithms in deep RL that lead to their poor performance or complete lack of training, then propose several theoretically-justified modifications to overcome them. We explore the quantitative and qualitative effects of the resulting algorithm on the Arcade Learning Environment (ALE) benchmark, and observe that it improves performance over Advantage Actor-Critic (A2C) on many games where non-trivial credit assignment is necessary to achieve high scores and where hindsight probabilities can be accurately estimated.
There are no more papers matching your filters at the moment.