alphaXiv

History

Papers Benchmarks

Utrecht University

2,023

25 Apr 2024

soft-condensed-matter physics neurons-and-cognition

Brain-inspired computing with fluidic iontronic nanochannels

Utrecht University Sogang University

The brain's remarkable and efficient information processing capability is driving research into brain-inspired (neuromorphic) computing paradigms. Artificial aqueous ion channels are emerging as an exciting platform for neuromorphic computing, representing a departure from conventional solid-state devices by directly mimicking the brain's fluidic ion transport. Supported by a quantitative theoretical model, we present easy to fabricate tapered microchannels that embed a conducting network of fluidic nanochannels between a colloidal structure. Due to transient salt concentration polarisation our devices are volatile memristors (memory resistors) that are remarkably stable. The voltage-driven net salt flux and accumulation, that underpin the concentration polarisation, surprisingly combine into a diffusionlike quadratic dependence of the memory retention time on the channel length, allowing channel design for a specific timescale. We implement our device as a synaptic element for neuromorphic reservoir computing. Individual channels distinguish various time series, that together represent (handwritten) numbers, for subsequent in-silico classification with a simple readout function. Our results represent a significant step towards realising the promise of fluidic ion channels as a platform to emulate the rich aqueous dynamics of the brain.

2,026

04 Dec 2024

high-energy-astrophysical-phenomena general-relativity-and-quantum-cosmology physics

Effect of Deviations from General Relativity on Searches for Gravitational Wave Microlensing and Type II Strong Lensing

University of Glasgow Utrecht University The University of Mississippi Nikhef—National Institute for Subatomic Physics

As the gravitational wave detector network is upgraded and the sensitivity of the detectors improves, novel scientific avenues open for exploration. For example, tests of general relativity will become more accurate as smaller deviations can be probed. Additionally, the detection of lensed gravitational waves becomes more likely. However, these new avenues could also interact with each other, and a gravitational wave event presenting deviations from general relativity could be mistaken for a lensed one. Here, we explore how phenomenological deviations from general relativity or binaries of exotic compact objects could impact those lensing searches focusing on a single event. We consider strong lensing, millilensing, and microlensing and find that certain phenomenological deviations from general relativity may be mistaken for all of these types of lensing. Therefore, our study shows that future candidate lensing events would need to be carefully examined to avoid a false claim of lensing where instead a deviation from general relativity has been seen.

538

01 Feb 2024

computer-science computers-and-society physics

Detection of Critical Events in Renewable Energy Production Time Series

Delft University of Technology Utrecht University TenneT TSO B.V.

The introduction of more renewable energy sources into the energy system increases the variability and weather dependence of electricity generation. Power system simulations are used to assess the adequacy and reliability of the electricity grid over decades, but often become computational intractable for such long simulation periods with high technical detail. To alleviate this computational burden, we investigate the use of outlier detection algorithms to find periods of extreme renewable energy generation which enables detailed modelling of the performance of power systems under these circumstances. Specifically, we apply the Maximum Divergent Intervals (MDI) algorithm to power generation time series that have been derived from ERA5 historical climate reanalysis covering the period from 1950 through 2019. By applying the MDI algorithm on these time series, we identified intervals of extreme low and high energy production. To determine the outlierness of an interval different divergence measures can be used. Where the cross-entropy measure results in shorter and strongly peaking outliers, the unbiased Kullback-Leibler divergence tends to detect longer and more persistent intervals. These intervals are regarded as potential risks for the electricity grid by domain experts, showcasing the capability of the MDI algorithm to detect critical events in these time series. For the historical period analysed, we found no trend in outlier intensity, or shift and lengthening of the outliers that could be attributed to climate change. By applying MDI on climate model output, power system modellers can investigate the adequacy and possible changes of risk for the current and future electricity grid under a wider range of scenarios.

334

31 Jul 2024

mathematics optimization-and-control computational-finance

Convergence of the deep BSDE method for stochastic control problems formulated through the stochastic maximum principle

Delft University of Technology Utrecht University

It is well-known that decision-making problems from stochastic control can be formulated by means of a forward-backward stochastic differential equation (FBSDE). Recently, the authors of Ji et al. 2022 proposed an efficient deep learning algorithm based on the stochastic maximum principle (SMP). In this paper, we provide a convergence result for this deep SMP-BSDE algorithm and compare its performance with other existing methods. In particular, by adopting a strategy as in Han and Long 2020, we derive a-posteriori estimate, and show that the total approximation error can be bounded by the value of the loss functional and the discretization error. We present numerical examples for high-dimensional stochastic control problems, both in case of drift- and diffusion control, which showcase superior performance compared to existing algorithms.

186

20 Nov 2025

computer-science other-condensed-matter artificial-intelligence

Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark

Researchers from Argonne National Laboratory and the University of Illinois Urbana-Champaign, with over 50 collaborators, introduce CritPt, a benchmark to evaluate Large Language Models (LLMs) on unpublished, research-level physics problems. The study found that current LLMs achieve very low accuracy on end-to-end scientific challenges (best base model at 5.7%) but show limited potential on modular sub-tasks, revealing a significant gap in their ability for genuine scientific reasoning and consistent reliability.

1,317

18 Oct 2024

computer-science machine-learning multiagent-systems

A Survey of Multi-Agent Deep Reinforcement Learning with Communication

Utrecht University

zhu changxi

This paper establishes a 9-dimensional classification framework for Multi-Agent Deep Reinforcement Learning with Communication (Comm-MADRL), systematically categorizing 41 existing models. It reveals prevailing trends and identifies underexplored areas to guide future research in designing intelligent multi-agent systems.

644

02 Jun 2025

computer-science computation-and-language human-ai-interaction

LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

University of Amsterdam LMU Munich

University of Copenhagen

ETH Zürich Universidade de Lisboa Utrecht University Saarland University University of Potsdam Heriot-Watt University University of Trento MCML Unbabel

A comprehensive empirical study assesses the reliability of Large Language Models (LLMs) as automated evaluators across 20 diverse Natural Language Processing tasks. The research evaluates 11 different LLMs, including both proprietary and open-weight models, against human judgments, revealing that LLM performance varies substantially by task and property evaluated and is generally below human inter-annotator agreement.

04 Dec 2025

computer-science computer-vision-and-pattern-recognition data-curation

SAM3-I: Segment Anything with Instructions

Chinese Academy of Sciences Nanjing University of Aeronautics and Astronautics Dalian University of Technology

Yale University

Northwestern University

University of Alberta

Southern University of Science and Technology Utrecht University

SAM3-I introduces a method that extends the Segment Anything Model (SAM) family to directly interpret complex natural language instructions for visual segmentation. This integrated approach significantly outperforms existing agent-based methods in instruction-following performance for both simple and complex prompts, while operating in a more efficient single-pass inference pipeline.

02 Oct 2025

computer-science computation-and-language machine-learning

Reason to Rote: Rethinking Memorization in Reasoning

Munich Center for Machine Learning (MCML)LMU Munich Utrecht University Saarland University

A study explores how large language models reconcile memorizing incorrect labels with applying generalizable reasoning. It reveals that models retain correct intermediate computations even for noisy instances, employing "outlier heuristics" in specific neurons to override these results for memorized outputs.

314

30 May 2025

computer-science computer-vision-and-pattern-recognition machine-learning

DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space

KU Leuven University of Innsbruck

Shandong University Utrecht University Moonshot AI Ltd.

Mang Ning

This paper explores image modeling from the frequency space and introduces DCTdiff, an end-to-end diffusion generative paradigm that efficiently models images in the discrete cosine transform (DCT) space. We investigate the design space of DCTdiff and reveal the key design factors. Experiments on different frameworks (UViT, DiT), generation tasks, and various diffusion samplers demonstrate that DCTdiff outperforms pixel-based diffusion models regarding generative quality and training efficiency. Remarkably, DCTdiff can seamlessly scale up to 512

\times

512 resolution without using the latent diffusion paradigm and beats latent diffusion (using SD-VAE) with only 1/4 training cost. Finally, we illustrate several intriguing properties of DCT image modeling. For example, we provide a theoretical proof of why 'image diffusion can be seen as spectral autoregression', bridging the gap between diffusion and autoregressive models. The effectiveness of DCTdiff and the introduced properties suggest a promising direction for image modeling in the frequency space. The code is this https URL

09 Oct 2025

computer-science computation-and-language data-curation

LeWiDi-2025 at NLPerspectives: The Third Edition of the Learning with Disagreements Shared Task

LMU Munich Utrecht University

Queen Mary University of London Fondazione Bruno Kessler University of Gothenburg Universit`a di Torino MCML Universit`a Milano Bicocca

Many researchers have reached the conclusion that AI models should be trained to be aware of the possibility of variation and disagreement in human judgments, and evaluated as per their ability to recognize such variation. The LEWIDI series of shared tasks on Learning With Disagreements was established to promote this approach to training and evaluating AI models, by making suitable datasets more accessible and by developing evaluation methods. The third edition of the task builds on this goal by extending the LEWIDI benchmark to four datasets spanning paraphrase identification, irony detection, sarcasm detection, and natural language inference, with labeling schemes that include not only categorical judgments as in previous editions, but ordinal judgments as well. Another novelty is that we adopt two complementary paradigms to evaluate disagreement-aware systems: the soft-label approach, in which models predict population-level distributions of judgments, and the perspectivist approach, in which models predict the interpretations of individual annotators. Crucially, we moved beyond standard metrics such as cross-entropy, and tested new evaluation metrics for the two paradigms. The task attracted diverse participation, and the results provide insights into the strengths and limitations of methods to modeling variation. Together, these contributions strengthen LEWIDI as a framework and provide new resources, benchmarks, and findings to support the development of disagreement-aware technologies.

129

01 Oct 2025

computer-science machine-learning mechanistic-interpretability

ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior

LMU Munich Utrecht University

The ExPLAIND framework unifies attribution across model components, training data, and training dynamics by extending the Exact Path Kernel (EPK) to modern deep learning optimizers like AdamW. This framework derives additive influence scores, demonstrated to accurately replicate model predictions and uncover a refined multi-phase understanding of phenomena such as Grokking.

121

23 May 2025

computer-science computation-and-language reasoning

Language models can learn implicit multi-hop reasoning, but only if they have lots of training data

Utrecht University Saarland University

Language models can perform implicit multi-hop reasoning up to 4 hops, achieving high accuracy when provided with sufficient training data. This capability, however, incurs an exponential increase in data requirements which curriculum learning can substantially reduce.

08 Oct 2024

ai-for-health chain-of-thought computer-science

Multi-Session Client-Centered Treatment Outcome Evaluation in Psychotherapy

University of Technology Sydney Utrecht University

In psychotherapy, therapeutic outcome assessment, or treatment outcome evaluation, is essential for enhancing mental health care by systematically evaluating therapeutic processes and outcomes. Existing large language model approaches often focus on therapist-centered, single-session evaluations, neglecting the client's subjective experience and longitudinal progress across multiple sessions. To address these limitations, we propose IPAEval, a client-Informed Psychological Assessment-based Evaluation framework that automates treatment outcome evaluations from the client's perspective using clinical interviews. IPAEval integrates cross-session client-contextual assessment and session-focused client-dynamics assessment to provide a comprehensive understanding of therapeutic progress. Experiments on our newly developed TheraPhase dataset demonstrate that IPAEval effectively tracks symptom severity and treatment outcomes over multiple sessions, outperforming previous single-session models and validating the benefits of items-aware reasoning mechanisms.

5,271

10 Apr 2024

computer-science artificial-intelligence computer-vision-and-pattern-recognition

Elucidating the Exposure Bias in Diffusion Models

KU Leuven Utrecht University Beijing Dark Side of the Moon Technology Co., Ltd.

Mang Ning

Diffusion models have demonstrated impressive generative capabilities, but their \textit{exposure bias} problem, described as the input mismatch between training and sampling, lacks in-depth exploration. In this paper, we systematically investigate the exposure bias problem in diffusion models by first analytically modelling the sampling distribution, based on which we then attribute the prediction error at each sampling step as the root cause of the exposure bias issue. Furthermore, we discuss potential solutions to this issue and propose an intuitive metric for it. Along with the elucidation of exposure bias, we propose a simple, yet effective, training-free method called Epsilon Scaling to alleviate the exposure bias. We show that Epsilon Scaling explicitly moves the sampling trajectory closer to the vector field learned in the training phase by scaling down the network output, mitigating the input mismatch between training and sampling. Experiments on various diffusion frameworks (ADM, DDIM, EDM, LDM, DiT, PFGM++) verify the effectiveness of our method. Remarkably, our ADM-ES, as a state-of-the-art stochastic sampler, obtains 2.17 FID on CIFAR-10 under 100-step unconditional generation. The code is available at \url{https://github.com/forever208/ADM-ES} and \url{https://github.com/forever208/EDM-ES}.

17 Oct 2025

general-relativity-and-quantum-cosmology high-energy-physics-theory physics

Anisotropic critical points from holography

Utrecht University Universidad Politécnica de Madrid National Sun Yat-sen University Instituto de Física Teórica UAM/CSIC

We present a comprehensive analysis of generic 5-dimensional Einstein-Maxwell-Dilaton-Axion (EMDA) holographic theories with exponential couplings. We find and classify exact, analytic, anisotropic solutions, both zero-temperature vacua and finite-temperature black brane backgrounds, with anisotropy sourced by scalar axions, magnetic fields, and charge densities, that can be interpreted as IR fixed points of renormalisation-group flows from UV-conformal fixed points. The resulting backgrounds feature a hyperscaling violation exponent and up to three independent Lifshitz-like exponents, generated by an equal number of independent coupling constants in the EMDA action. We derive the holographic stress-energy tensor and the corresponding equation of state, and discuss the behavior of the anisotropic speed of sound and butterfly velocity. We show that these theories can be consistently constrained by imposing several natural requirements, including energy conditions, thermodynamic stability, and causality. Additionally, we analyse hard probes in this class of theories, including Brownian motion, momentum broadening and jet quenching, and we demonstrate that a fully analytic treatment is possible, making their dependence on the underlying anisotropy explicit. We highlight the relevance of these models as benchmarks for strongly coupled anisotropic matter in nature, from the quark-gluon plasma created in heavy-ion collisions to dense QCD phases in neutron-star mergers and the cores of compact objects.

04 Sep 2025

bayesian-deep-learning computer-science machine-learning

An invertible generative model for forward and inverse problems

Utrecht University University of Twente Centrum Wiskunde en Informatica

We formulate the inverse problem in a Bayesian framework and aim to train a generative model that allows us to simulate (i.e., sample from the likelihood) and do inference (i.e., sample from the posterior). We review the use of triangular normalizing flows for conditional sampling in this context and show how to combine two such triangular maps (an upper and a lower one) in to one invertible mapping that can be used for simulation and inference. We work out several useful properties of this invertible generative model and propose a possible training loss for training the map directly. We illustrate the workings of this new approach to conditional generative modeling numerically on a few stylized examples.

221

05 Feb 2024

agent-based-systems computer-science conversational-ai

LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models

ETH Zürich Utrecht University

Ivar Frisch

While both agent interaction and personalisation are vibrant topics in research on large language models (LLMs), there has been limited focus on the effect of language interaction on the behaviour of persona-conditioned LLM agents. Such an endeavour is important to ensure that agents remain consistent to their assigned traits yet are able to engage in open, naturalistic dialogues. In our experiments, we condition GPT-3.5 on personality profiles through prompting and create a two-group population of LLM agents using a simple variability-inducing sampling algorithm. We then administer personality tests and submit the agents to a collaborative writing task, finding that different profiles exhibit different degrees of personality consistency and linguistic alignment to their conversational partners. Our study seeks to lay the groundwork for better understanding of dialogue-based interaction between LLMs and highlights the need for new approaches to crafting robust, more human-like LLM personas for interactive environments.

207

04 Apr 2025

computer-science computer-vision-and-pattern-recognition generative-models

Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis

Utrecht University Saarland University Max-Planck Institute for Informatics

Researchers from Max Planck Institute for Informatics and Saarland University developed RAG-GESTURE, a system that synthesizes natural and semantically rich co-speech gestures by integrating explicit linguistic knowledge into a pre-trained diffusion model during inference. The method showed improved quantitative metrics and consistently higher user preference for both naturalness and appropriateness compared to existing neural and RAG-based gesture generation approaches.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring