Aalto University logoAalto University
An algorithm for efficiently calculating the expected size of single-seed cascade dynamics on networks is proposed and tested. The expected size is a time-dependent quantity and so enables the identification of nodes who are the most influential early or late in the spreading process. The measure is accurate for both critical and subcritical dynamic regimes and so generalises the nonbacktracking centrality that was previously shown to successfully identify the most influential single spreaders in a model of critical epidemics on networks.
Aalto University researchers developed Graph4GUI, a method that represents Graphical User Interfaces as heterogeneous bipartite graphs, capturing element properties and layout constraints. This representation, processed by a graph neural network, improves GUI autocompletion, topic classification, and retrieval tasks.
Aalto University researchers developed DemoGen, a method for generating in-context learning demonstrations in grounded language learning, which significantly enhances compositional generalization capabilities. It achieves 80% success on gSCAN Split H and 59% on NL-gSCAN Split H, surpassing retrieval-based approaches.
This paper establishes comprehensive guidelines for the robust evaluation of machine learning systems, systematically addressing data selection, metric choice, and statistical significance. It distills best practices to prevent misleading conclusions and improve the reliability of ML experiments, emphasizing application-centric design and statistical rigor.
2
We present observations and analysis of the starburst, PACS-819, at z=1.45 (M=1010.7M_*=10^{10.7} M_{ \odot}), using high-resolution (0.10^{\prime \prime}.1; 0.8 kpc) ALMA and multi-wavelength JWST images from the COSMOS-Web program. Dissimilar to HST/ACS images in the rest-frame UV, the redder NIRCam and MIRI images reveal a smooth central mass concentration and spiral-like features, atypical for such an intense starburst. Through dynamical modeling of the CO J=5--4 emission with ALMA, PACS-819 is rotation-dominated thus has a disk-like nature. However, kinematic anomalies in CO and asymmetric features in the bluer JWST bands (e.g., F150W) support a more disturbed nature likely due to interactions. The JWST imaging further enables us to map the distribution of stellar mass and dust attenuation, thus clarifying the relationships between different structural components, not discernable in the previous HST images. The CO J = 5 -- 4 and FIR dust continuum emission are co-spatial with a heavily-obscured starbursting core (<1 kpc) which is partially surrounded by much less obscured star-forming structures including a prominent arc, possibly a tidally-distorted dwarf galaxy, and a clump, either a sign of an ongoing violent disk instability or a recently accreted low-mass satellite. With spatially-resolved maps, we find a high molecular gas fraction in the central area reaching 3\sim3 (MgasM_{\text{gas}}/MM_*) and short depletion times (Mgas/SFRM_{\text{gas}}/SFR\sim 120 Myrs) across the entire system. These observations provide insights into the complex nature of starbursts in the distant universe and underscore the wealth of complementary information from high-resolution observations with both ALMA and JWST.
Researchers from NVIDIA systematically stabilize diffusion model training dynamics by introducing magnitude-preserving architectural designs and a novel post-hoc EMA tuning method. This work achieves an FID of 1.81 on ImageNet-512, significantly improving image quality and computational efficiency while offering a refined understanding of training processes.
92
Autoguidance introduces a new method to improve image quality in diffusion models by guiding them with an inferior version of themselves, which helps preserve result diversity. This technique, developed by researchers at NVIDIA and Aalto University, addresses limitations of Classifier-Free Guidance and establishes new record FID scores on ImageNet datasets, making high-quality unconditional image generation possible.
Collective effects, such as Dicke superradiant emission, can enhance the performance of a quantum device. Here, we study the heat current flowing between a cold and a hot bath through an ensemble of NN qubits, which are collectively coupled to the thermal baths. We find a regime where the collective coupling leads to a quadratic scaling of the heat current with NN in a finite-size scenario. Conversely, when approaching the thermodynamic limit, we prove that the collective scenario exhibits a parametric enhancement over the non-collective case. We then consider the presence of a third uncontrolled {\it parasitic} bath, interacting locally with each qubit, that models unavoidable couplings to the external environment. Despite having a non-perturbative effect on the steady-state currents, we show that the collective enhancement is robust to such an addition. Finally, we discuss the feasibility of realizing such a Dicke heat valve with superconducting circuits. Our findings indicate that in a minimal realistic experimental setting with two superconducting qubits, the collective advantage offers an enhancement of approximately 10%10\% compared to the non-collective scenario.
Researchers developed SecureAgentBench, a benchmark with 105 real-world, repository-level tasks, to evaluate LLM-powered code agents' ability to generate secure code. Evaluations show that current agents achieve a mere 9.2% success rate in producing functionally correct and secure solutions, frequently introducing novel vulnerabilities and struggling even with explicit security guidance.
GLaMM enables Large Multimodal Models to generate pixel-level segmentation masks directly within natural language conversations, addressing a critical limitation in dense visual grounding. Researchers at MBZUAI and collaborating institutions developed a new architecture and the massive, automatically annotated Grounding-anything Dataset (GranD), achieving state-of-the-art performance in pixel-grounded conversation generation and referring expression segmentation.
889
Aalto University researchers developed Code World Models (CWMs) by employing large language models to generate executable Python code for environment dynamics, guided by a novel Monte Carlo Tree Search algorithm called GIF-MCTS. This method produces precise and interpretable simulators that are four to seven orders of magnitude faster than direct LLM inference, enabling effective model-based reinforcement learning with improved planning performance, achieving up to 0.91 CWM accuracy and 0.81 normalized return on discrete RL tasks.
8
SLAM3R introduces a two-hierarchy neural network framework for real-time dense 3D scene reconstruction from monocular RGB videos, directly regressing 3D pointmaps and progressively aligning local reconstructions. The system achieves an average accuracy of 2.13cm and completeness of 2.34cm on the 7 Scenes dataset, operating at 20+ FPS on an NVIDIA 4090D GPU.
921
Developing large language models (LLMs) to cooperate and compete effectively within multi-agent systems (MASs) is a critical step towards more advanced intelligence. While reinforcement learning (RL) has proven effective for enhancing reasoning in single-agent tasks, its extension to multi-turn, multi-agent scenarios remains underexplored due to the challenges of long-horizon credit assignment and agent-specific advantage estimation. To address these challenges, we introduce MARSHAL, an end-to-end RL framework that incentivizes Multi-Agent Reasoning through Self-play witH strAtegic LLMs in both cooperative and competitive games. MARSHAL features a turn-level advantage estimator that aligns learning signals with each interaction for credit assignment, and an agent-specific advantage normalization to stabilize multi-agent training. By learning with self-play across cooperative and competitive games, MARSHAL agent trained from Qwen3-4B develops strong strategic abilities that generalize to held-out games with up to 28.7% performance improvements. More importantly, the capability acquired through self-play generalizes beyond games, yielding consistent performance gains of MASs in reasoning benchmarks. When integrated into leading MASs, our MARSHAL agent achieves significant performance gains of up to 10.0% on AIME, 6.6% on GPQA-Diamond, and 3.5% on average across all benchmarks. These results establish end-to-end RL training with self-play in strategic games as a powerful approach for developing generalizable multi-agent reasoning capabilities in LLMs.
5
NVIDIA researchers developed Adaptive Discriminator Augmentation (ADA), a method that enables training high-quality GANs using significantly less data by stochastically augmenting discriminator inputs and adaptively controlling augmentation strength. This approach effectively mitigates discriminator overfitting and prevents augmentation leakage, achieving new state-of-the-art results on datasets like CIFAR-10 with thousands of images.
16
TESSERA introduces a framework that generates precomputed, multimodal pixel embeddings at 10m global resolution for Earth observation data, integrating Sentinel-1 and Sentinel-2 time series. The system delivers state-of-the-art or comparable performance across five diverse downstream tasks, including crop classification and canopy height estimation, often with significantly less labeled data.
An RL framework was developed at Aalto University to enable mobile manipulators to navigate by actively manipulating obstacles, integrating visual affordance maps and manipulability priors. This approach demonstrated enhanced learning efficiency in simulation, converging significantly faster than baseline methods, and achieved robust policy transfer to a real Boston Dynamics Spot robot, showing an 0.8 success rate for a manipulation-for-navigation task.
Researchers from Aalto University and NVIDIA refined Classifier-Free Guidance (CFG) in diffusion models by applying it only within a specific, limited interval of noise levels. This method improved the state-of-the-art FID on ImageNet-512 to 1.40, enhanced perceptual quality and diversity, and offered over 20% faster inference for models like SD-XL.
8
A Computational Rationality framework models human agents with bounded memory, revealing how memory decay leads to biased beliefs and adaptive, yet seemingly sub-optimal, decisions. This work also presents an efficient online inference method, Nested Particle Filtering, that accurately estimates user-specific memory bounds, enabling AI assistants to provide highly personalized and adaptive support.
The recently proposed Temporal Ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks. It maintains an exponential moving average of label predictions on each training example, and penalizes predictions that are inconsistent with this target. However, because the targets change only once per epoch, Temporal Ensembling becomes unwieldy when learning large datasets. To overcome this problem, we propose Mean Teacher, a method that averages model weights instead of label predictions. As an additional benefit, Mean Teacher improves test accuracy and enables training with fewer labels than Temporal Ensembling. Without changing the network architecture, Mean Teacher achieves an error rate of 4.35% on SVHN with 250 labels, outperforming Temporal Ensembling trained with 1000 labels. We also show that a good network architecture is crucial to performance. Combining Mean Teacher and Residual Networks, we improve the state of the art on CIFAR-10 with 4000 labels from 10.55% to 6.28%, and on ImageNet 2012 with 10% of the labels from 35.24% to 9.11%.
20
There are no more papers matching your filters at the moment.