alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Ask or search anything...

Events

Watch Recordings

AI for Law01/09 · Joel Niklaus · Hugging Face

Papers Benchmarks

Johannes Kepler University

SymbolicAI: A framework for logic-based approaches combining generative models and solvers

21 Aug 2024

marius-constantin-dinu

Marius-Constantin Dinu

Austrian Academy of Sciences Amazon

The SYMBOLICAI framework integrates large language models as semantic parsers with various solvers to facilitate complex, multi-step neuro-symbolic AI workflows. It introduces the VERTEX score for evaluating these multi-step generative processes, showing GPT-4 Turbo achieves the highest overall performance (0.68 VERTEX score) but all evaluated models exhibit unreliability in sophisticated logical reasoning and hierarchical graph orchestration.

#computer-science #artificial-intelligence #machine-learning

Paper thumbnail

A Diffusion Model Framework for Unsupervised Neural Combinatorial Optimization

22 Aug 2025

NXAI GmbH Johannes Kepler University

Learning to sample from intractable distributions over discrete sets without relying on corresponding training data is a central problem in a wide range of fields, including Combinatorial Optimization. Currently, popular deep learning-based approaches rely primarily on generative models that yield exact sample likelihoods. This work introduces a method that lifts this restriction and opens the possibility to employ highly expressive latent variable models like diffusion models. Our approach is conceptually based on a loss that upper bounds the reverse Kullback-Leibler divergence and evades the requirement of exact sample likelihoods. We experimentally validate our approach in data-free Combinatorial Optimization and demonstrate that our method achieves a new state-of-the-art on a wide range of benchmark problems.

#computer-science #artificial-intelligence #discrete-mathematics

Paper thumbnail

Towards Accurate Generative Models of Video: A New Metric & Challenges

27 Mar 2019

Johannes Kepler University IDSIA

A new metric, Fréchet Video Distance (FVD), and a suite of challenging benchmark datasets, StarCraft 2 Videos (SCV), are introduced to provide more accurate evaluation tools for deep generative video models. FVD demonstrates strong correlation with human perception of video quality, while current models show significant limitations on SCV tasks requiring relational reasoning and long-term consistency.

#computer-science #artificial-intelligence #computer-vision-and-pattern-recognition

Paper thumbnail

EngravingGNN: A Hybrid Graph Neural Network for End-to-End Piano Score Engraving

23 Sep 2025

Johannes Kepler University Linz Institute of Technology

This paper focuses on automatic music engraving, i.e., the creation of a humanly-readable musical score from musical content. This step is fundamental for all applications that include a human player, but it remains a mostly unexplored topic in symbolic music processing. In this work, we formalize the problem as a collection of interdependent subtasks, and propose a unified graph neural network (GNN) framework that targets the case of piano music and quantized symbolic input. Our method employs a multi-task GNN to jointly predict voice connections, staff assignments, pitch spelling, key signature, stem direction, octave shifts, and clef signs. A dedicated postprocessing pipeline generates print-ready MusicXML/MEI outputs. Comprehensive evaluation on two diverse piano corpora (J-Pop and DCML Romantic) demonstrates that our unified model achieves good accuracy across all subtasks, compared to existing systems that only specialize in specific subtasks. These results indicate that a shared GNN encoder with lightweight task-specific decoders in a multi-task setting offers a scalable and effective solution for automatic music engraving.

#computer-science #artificial-intelligence #graphics

Paper thumbnail

TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining

12 May 2025

Johannes Kepler University LIT Artificial Intelligence Lab

Researchers at Johannes Kepler University introduced TACOS, a dataset of 47,748 temporally-aligned audio captions, to enable audio-language models to understand the precise timing and temporal relationships of sound events. Their frame-wise contrastive learning approach, leveraging this data, improved text-based sound event detection by 5.5 percentage points in PSDS1 score compared to models trained with clip-level captions.

#computer-science #contrastive-learning #machine-learning

Paper thumbnail

Beat this! Accurate beat tracking without DBN postprocessing

31 Jul 2024

Johannes Kepler University LIT AI Lab, Linz Institute of Technology

We propose a system for tracking beats and downbeats with two objectives: generality across a diverse music range, and high accuracy. We achieve generality by training on multiple datasets -- including solo instrument recordings, pieces with time signature changes, and classical music with high tempo variations -- and by removing the commonly used Dynamic Bayesian Network (DBN) postprocessing, which introduces constraints on the meter and tempo. For high accuracy, among other improvements, we develop a loss function tolerant to small time shifts of annotations, and an architecture alternating convolutions with transformers either over frequency or time. Our system surpasses the current state of the art in F1 score despite using no DBN. However, it can still fail, especially for difficult and underrepresented genres, and performs worse on continuity metrics, so we publish our model, code, and preprocessed datasets, and invite others to beat this.

#computer-science #machine-learning #sound

Paper thumbnail

Learning to Modulate pre-trained Models in RL

27 Oct 2023

fabian-paischer

Fabian Paischer

Google DeepMind UCL

Reinforcement Learning (RL) has been successful in various domains like robotics, game playing, and simulation. While RL agents have shown impressive capabilities in their specific tasks, they insufficiently adapt to new tasks. In supervised learning, this adaptation problem is addressed by large-scale pre-training followed by fine-tuning to new down-stream tasks. Recently, pre-training on multiple tasks has been gaining traction in RL. However, fine-tuning a pre-trained model often suffers from catastrophic forgetting. That is, the performance on the pre-training tasks deteriorates when fine-tuning on new tasks. To investigate the catastrophic forgetting phenomenon, we first jointly pre-train a model on datasets from two benchmark suites, namely Meta-World and DMControl. Then, we evaluate and compare a variety of fine-tuning methods prevalent in natural language processing, both in terms of performance on new tasks, and how well performance on pre-training tasks is retained. Our study shows that with most fine-tuning approaches, the performance on pre-training tasks deteriorates significantly. Therefore, we propose a novel method, Learning-to-Modulate (L2M), that avoids the degradation of learned skills by modulating the information flow of the frozen pre-trained model via a learnable modulation pool. Our method achieves state-of-the-art performance on the Continual-World benchmark, while retaining performance on the pre-training tasks. Finally, to aid future research in this area, we release a dataset encompassing 50 Meta-World and 16 DMControl tasks.

#computer-science #continual-learning #artificial-intelligence

Paper thumbnail

Coherent Control of Quantum-Dot Spins with Cyclic Optical Transitions

17 Sep 2025

University of Cambridge

Solid-state spins are promising as interfaces from stationary qubits to single photons for quantum communication technologies. Semiconductor quantum dots have excellent optical coherence, exhibit near unity collection efficiencies when coupled to photonic structures, and possess long-lived spins for quantum memory. However, the incompatibility of performing optical spin control and single-shot readout simultaneously has been a challenge faced by almost all solid-state emitters. To overcome this, we leverage light-hole mixing to realize a highly asymmetric lambda system in a negatively charged heavy hole exciton in Faraday configuration. By compensating GHz-scale differential Stark shifts, induced by unequal coupling to Raman control fields, and by performing nuclear-spin cooling, we achieve quantum control of an electron-spin qubit with a

\pi

-pulse contrast of 97.4% while preserving spin-selective optical transitions with a cyclicity of 409. We demonstrate this scheme for both GaAs and InGaAs quantum dots, and show that it is compatible with the operation of a nuclear quantum memory. Our approach thus enables repeated emission of indistinguishable photons together with qubit control, as required for single-shot readout, photonic cluster-state generation, and quantum repeater technologies.

#mesoscale-and-nanoscale-physics #physics #quantum-physics

Paper thumbnail

DarTwin made precise by SysMLv2 -- An Experiment

14 Oct 2025

Johannes Kepler University University of Antwerp

The new SysMLv2 adds mechanisms for the built-in specification of domain-specific concepts and language extensions. This feature promises to facilitate the creation of Domain-Specific Languages (DSLs) and interfacing with existing system descriptions and technical designs. In this paper, we review these features and evaluate SysMLv2's capabilities using concrete use cases. We develop DarTwin DSL, a DSL that formalizes the existing DarTwin notation for Digital Twin (DT) evolution, through SysMLv2, thereby supposedly enabling the wide application of DarTwin's evolution templates using any SysMLv2 tool. We demonstrate DarTwin DSL, but also point out limitations in the currently available tooling of SysMLv2 in terms of graphical notation capabilities. This work contributes to the growing field of Model-Driven Engineering (MDE) for DTs and combines it with the release of SysMLv2, thus integrating a systematic approach with DT evolution management in systems engineering.

#computer-science #software-engineering #electrical-engineering

Paper thumbnail

Conformal Prediction for Time Series with Modern Hopfield Networks

02 Nov 2023

Google Research Johannes Kepler University

To quantify uncertainty, conformal prediction methods are gaining continuously more interest and have already been successfully applied to various domains. However, they are difficult to apply to time series as the autocorrelative structure of time series violates basic assumptions required by conformal prediction. We propose HopCPT, a novel conformal prediction approach for time series that not only copes with temporal structures but leverages them. We show that our approach is theoretically well justified for time series where temporal dependencies are present. In experiments, we demonstrate that our new approach outperforms state-of-the-art conformal prediction methods on multiple real-world time series datasets from four different domains.

#computer-science #artificial-intelligence #machine-learning

Paper thumbnail

FlashRNN: I/O-Aware Optimization of Traditional RNNs on modern hardware

13 Mar 2025

NXAI Lab NXAI GmbH

While Transformers and other sequence-parallelizable neural network architectures seem like the current state of the art in sequence modeling, they specifically lack state-tracking capabilities. These are important for time-series tasks and logical reasoning. Traditional RNNs like LSTMs and GRUs, as well as modern variants like sLSTM do have these capabilities at the cost of strictly sequential processing. While this is often seen as a strong limitation, we show how fast these networks can get with our hardware-optimization FlashRNN in Triton and CUDA, optimizing kernels to the register level on modern GPUs. We extend traditional RNNs with a parallelization variant that processes multiple RNNs of smaller hidden state in parallel, similar to the head-wise processing in Transformers. To enable flexibility on different GPU variants, we introduce a new optimization framework for hardware-internal cache sizes, memory and compute handling. It models the hardware in a setting using polyhedral-like constraints, including the notion of divisibility. This speeds up the solution process in our ConstrINT library for general integer constraint satisfaction problems (integer CSPs). We show that our kernels can achieve 50x speed-ups over a vanilla PyTorch implementation and allow 40x larger hidden sizes compared to our Triton implementation. Our open-source kernels and the optimization library are released here to boost research in the direction of state-tracking enabled RNNs and sequence modeling: this https URL

#computer-science #artificial-intelligence #machine-learning

Paper thumbnail

Bright Single-Photon Emission from Individual Tin-Vacancy Centers in Multi-Cone Diamond Waveguides

13 Oct 2025

ETH Zurich Kyoto University logo

Kyoto University

Diamonds containing color centers have recently gathered significant attention for photonic quantum technologies, including quantum sensing, photonic quantum computers, and quantum networks. Among the various color centers, tin-vacancy (SnV) centers are particularly promising due to the high emission efficiency from the zero-phonon line and due to their long spin coherence times. However, the extraction of photons from diamond remains a key challenge. Here we demonstrate high photon extraction from a single SnV center incorporated in a diamond nanopillar with tapered sidewalls and a multi-cone structure. A sharp emission peak with a full width at half maximum (FWHM) of

6\,

nm was observed at a wavelength of

619\,

nm. Furthermore, the second-order correlation function exhibited an antibunching dip well below

g^{(2)}(0) = 0.5

, indicating single-photon emission. Remarkably, the emitter achieved a high saturation count rate of approximately

9\,

Mcps. These results establish our nanopillar platform as a promising candidate for bright and stable quantum sources and sensors based on SnV centers in diamond.

#materials-science #physics #quantum-physics

Paper thumbnail

Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences

06 Nov 2024

NXAI GmbH Johannes Kepler University

Language models for biological and chemical sequences enable crucial applications such as drug discovery, protein engineering, and precision medicine. Currently, these language models are predominantly based on Transformer architectures. While Transformers have yielded impressive results, their quadratic runtime dependency on the sequence length complicates their use for long genomic sequences and in-context learning on proteins and chemical sequences. Recently, the recurrent xLSTM architecture has been shown to perform favorably compared to Transformers and modern state-space model (SSM) architectures in the natural language domain. Similar to SSMs, xLSTMs have a linear runtime dependency on the sequence length and allow for constant-memory decoding at inference time, which makes them prime candidates for modeling long-range dependencies in biological and chemical sequences. In this work, we tailor xLSTM towards these domains and propose a suite of architectural variants called Bio-xLSTM. Extensive experiments in three large domains, genomics, proteins, and chemistry, were performed to assess xLSTM's ability to model biological and chemical sequences. The results show that models based on Bio-xLSTM a) can serve as proficient generative models for DNA, protein, and chemical sequences, b) learn rich representations for those modalities, and c) can perform in-context learning for proteins and small molecules.

#ai-for-health #computer-science #artificial-intelligence

Paper thumbnail

Semantic HELM: A Human-Readable Memory for Reinforcement Learning

27 Oct 2023

fabian-paischer

Fabian Paischer

Johannes Kepler University Institute of Advanced Research in Artificial Intelligence (IARAI)

Reinforcement learning agents deployed in the real world often have to cope with partially observable environments. Therefore, most agents employ memory mechanisms to approximate the state of the environment. Recently, there have been impressive success stories in mastering partially observable environments, mostly in the realm of computer games like Dota 2, StarCraft II, or MineCraft. However, existing methods lack interpretability in the sense that it is not comprehensible for humans what the agent stores in its memory. In this regard, we propose a novel memory mechanism that represents past events in human language. Our method uses CLIP to associate visual inputs with language tokens. Then we feed these tokens to a pretrained language model that serves the agent as memory and provides it with a coherent and human-readable representation of the past. We train our memory mechanism on a set of partially observable environments and find that it excels on tasks that require a memory component, while mostly attaining performance on-par with strong baselines on tasks that do not. On a challenging continuous recognition task, where memorizing the past is crucial, our memory mechanism converges two orders of magnitude faster than prior methods. Since our memory mechanism is human-readable, we can peek at an agent's memory and check whether crucial pieces of information have been stored. This significantly enhances troubleshooting and paves the way toward more interpretable agents.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

Detection Seizure Onset Zone Using Circadian Fluctuating Epileptic Biomarkers: A Signal Processing and Machine Learning Approach

17 Oct 2025

Swansea University Johannes Kepler University

Epileptic biomarkers play a crucial role in identifying the origin of seizures, an essential aspect of pre-surgical planning for epilepsy treatment. These biomarkers can vary significantly over time. By studying these temporal fluctuations, we can enhance their effectiveness in guiding surgical planning. This research focuses on examining how circadian rhythms influence epilepsy biomarkers and aims to determine the optimal times for their analysis. To investigate the relationship between epilepsy biomarkers and circadian rhythm, the sleep/wake states first need to be classified. After the biomarkers are identified, they are compared across these states. A retrospective analysis was conducted on intracranial electroencephalography data from patients with focal epilepsy. The biomarkers spike, sequence of spikes, high-frequency oscillations (HFOs), and pathological HFOs were identified through automatic detection. The alpha/delta ratio was also calculated to distinguish between asleep and awake stages. Data from 9 patients were analyzed, and the classification of sleep and wake states was achieved with an area under the curve of 84%. All biomarker rates were higher during the sleep stage compared to the wake stage. Pathological HFOs and the sequence of spikes proved to be more precise indicators regarding distance to seizure onset than spikes or HFOs. Unlike previous studies that relied predominantly on long-term spike biomarker analysis, this study is the first to utilize a comprehensive set of biomarkers, including HFOs, spike sequences, and pathological HFOs, to enhance seizure onset zone prediction. The rates of epilepsy biomarkers during sleep vary considerably from those seen while awake, making sleep data analysis more effective for accurately predicting the seizure onset zone.

#signal-processing #electrical-engineering

Paper thumbnail

LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities

21 May 2025

NXAI GmbH Johannes Kepler University

Generative models are spearheading recent progress in deep learning, showcasing strong promise for trajectory sampling in dynamical systems as well. However, whereas latent space modeling paradigms have transformed image and video generation, similar approaches are more difficult for most dynamical systems. Such systems -- from chemical molecule structures to collective human behavior -- are described by interactions of entities, making them inherently linked to connectivity patterns, entity conservation, and the traceability of entities over time. Our approach, LaM-SLidE (Latent Space Modeling of Spatial Dynamical Systems via Linked Entities), bridges the gap between: (1) keeping the traceability of individual entities in a latent system representation, and (2) leveraging the efficiency and scalability of recent advances in image and video generation, where pre-trained encoder and decoder enable generative modeling directly in latent space. The core idea of LaM-SLidE is the introduction of identifier representations (IDs) that enable the retrieval of entity properties and entity composition from latent system representations, thus fostering traceability. Experimentally, across different domains, we show that LaM-SLidE performs favorably in terms of speed, accuracy, and generalizability. Code is available at this https URL .

#computer-science #artificial-intelligence #machine-learning

Paper thumbnail

TRIAD: Automated Traceability Recovery based on Biterm-enhanced Deduction of Transitive Links among Artifacts

17 Jan 2024

Nanjing University North Carolina State University

Traceability allows stakeholders to extract and comprehend the trace links among software artifacts introduced across the software life cycle, to provide significant support for software engineering tasks. Despite its proven benefits, software traceability is challenging to recover and maintain manually. Hence, plenty of approaches for automated traceability have been proposed. Most rely on textual similarities among software artifacts, such as those based on Information Retrieval (IR). However, artifacts in different abstraction levels usually have different textual descriptions, which can greatly hinder the performance of IR-based approaches (e.g., a requirement in natural language may have a small textual similarity to a Java class). In this work, we leverage the consensual biterms and transitive relationships (i.e., inner- and outer-transitive links) based on intermediate artifacts to improve IR-based traceability recovery. We first extract and filter biterms from all source, intermediate, and target artifacts. We then use the consensual biterms from the intermediate artifacts to extend the biterms of both source and target artifacts, and finally deduce outer and inner-transitive links to adjust text similarities between source and target artifacts. We conducted a comprehensive empirical evaluation based on five systems widely used in other literature to show that our approach can outperform four state-of-the-art approaches, and how its performance is affected by different conditions of source, intermediate, and target artifacts. The results indicate that our approach can outperform baseline approaches in AP over 15% and MAP over 10% on average.

#computer-science #software-engineering

Paper thumbnail

Using Consensual Biterms from Text Structures of Requirements and Code to Improve IR-Based Traceability Recovery

05 Sep 2022

Nanjing University Technische Universität Ilmenau

Traceability approves trace links among software artifacts based on whether two artifacts are related by system functionalities. The traces are valuable for software development, but are difficult to obtain manually. To cope with the costly and fallible manual recovery, automated approaches are proposed to recover traces through textual similarities among software artifacts, such as those based on Information Retrieval (IR). However, the low quality & quantity of artifact texts negatively impact the calculated IR values, thus greatly hindering the performance of IR-based approaches. In this study, we propose to extract co-occurred word pairs from the text structures of both requirements and code (i.e., consensual biterms) to improve IR-based traceability recovery. We first collect a set of biterms based on the part-of-speech of requirement texts, and then filter them through the code texts. We then use these consensual biterms to both enrich the input corpus for IR techniques and enhance the calculations of IR values. A nine-system-based evaluation shows that in general, when solely used to enhance IR techniques, our approach can outperform pure IR-based approaches and another baseline by 21.9% & 21.8% in AP, and 9.3% & 7.2% in MAP, respectively. Moreover, when used to collaborate with another enhancing strategy from different perspectives, it can outperform this baseline by 5.9% in AP and 4.8% in MAP.

#computer-science #software-engineering

Paper thumbnail

Towards Hamiltonian Simulation with Decision Diagrams

01 Mar 2024

Technical University of Munich Johannes Kepler University

This paper proposes a novel approach to Hamiltonian simulation using Decision Diagrams (DDs), which are an exact representation based on exploiting redundancies in representations of quantum states and operations. While the simulation of Hamiltonians has been studied extensively, scaling these simulations to larger or more complex systems is often challenging and may require approximations or new simulation methods altogether. DDs offer such an alternative that has not yet been applied to Hamiltonian simulation. In this work, we investigate the behavior of DDs for this task. To this end, we review the basics of DDs such as their construction and present how the relevant operations for Hamiltonian simulation are implemented in this data structure -- leading to the first DD-based Hamiltonian simulation approach. Based on several series of evaluations and comparisons, we then discuss insights about the performance of this complementary approach. Overall, these studies show that DDs indeed may offer a promising new data structure which, for certain examples, can provide orders of magnitudes of improvement compared to the state-of-the-art, yet also comes with its own, fundamentally different, limitations.

#computer-science #other-condensed-matter #emerging-technologies

Paper thumbnail

The Category of Operator Spaces and Complete Contractions

30 Dec 2024

Université Paris-Saclay

We show that the category OS of operator spaces, with complete contractions as morphisms, is locally countably presentable. This result, together with its symmetric monoidal closed structure with respect to the projective tensor product of operator spaces, implies the existence of cofree (cocommutative) coalgebras with respect to the projective tensor product and therefore provides a mathematical model of Intuitionistic Linear Logic in the sense of Lafont.

#computer-science #logic-in-computer-science #category-theory

Paper thumbnail

There are no more papers matching your filters at the moment.