alphaXiv

2,030

04 Dec 2024

computer-science artificial-intelligence human-computer-interaction

The Use of Artificial Intelligence in Military Intelligence: An Experimental Investigation of Added Value in the Analysis Process

University of the Bundeswehr Munich Aleph Alpha

An experimental study empirically validates the practical benefits of artificial intelligence in supporting human military intelligence analysts during the analysis process, demonstrating improved accuracy on factual tasks and perceived increases in analysis speed. The proprietary AI demonstrator deepCOM, built on a Large Language Model, facilitated over 6.5 points higher scores on factual questions for AI-assisted groups, though it did not increase analyst confidence.

547

18 Mar 2025

analysis-of-pdes mathematics

Lipolysis on Lipid Droplets: Mathematical Modelling and Numerical Discretisations

University of Graz University of the Bundeswehr Munich

Lipolysis is a life-essential metabolic process, which supplies fatty acids stored in lipid droplets to the body in order to match the demands of building new cells and providing cellular energy. In this paper, we present a first mathematical modelling approach for lipolysis, which takes into account that the involved enzymes act on the surface of lipid droplets. We postulate an active region near the surface where the substrates are within reach of the surface-bound enzymes and formulate a system of reaction-diffusion PDEs, which connect the active region to the inner core of lipid droplets via interface conditions. We establish two numerical discretisations based on finite element method and isogeometric analysis, and validate them to perform reliably. Since numerical tests are best performed on non-zero explicit stationary state solutions, we introduce and analyse a model, which describes besides lipolysis also a reverse process (yet in a physiologically much oversimplified way). The system is not coercive such that establishing well-posedness is a non-standard task. We prove the unique existence of global and equilibrium solutions. We establish exponential convergence to the equilibrium solutions using the entropy method. We then study the stationary state model and compute explicitly for radially symmetric solutions. Concerning the finite element methods, we show numerically the linear and quadratic convergence of the errors with respect to the

H^{1}

- and

L^{2}

-norms, respectively. Finally, we present numerical simulations of a prototypical PDE model of lipolysis and illustrate that ATGL clustering on lipid droplets can significantly slow down lipolysis.

320

27 Nov 2025

autonomous-vehicles computer-science artificial-intelligence

Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis

Austrian Institute of Technology

Stanford University TU Darmstadt

Technical University of Munich University of the Bundeswehr Munich fortiss GmbH Audi AG

For autonomous vehicles, safe navigation in complex environments depends on handling a broad range of diverse and rare driving scenarios. Simulation- and scenario-based testing have emerged as key approaches to development and validation of autonomous driving systems. Traditional scenario generation relies on rule-based systems, knowledge-driven models, and data-driven synthesis, often producing limited diversity and unrealistic safety-critical cases. With the emergence of foundation models, which represent a new generation of pre-trained, general-purpose AI models, developers can process heterogeneous inputs (e.g., natural language, sensor data, HD maps, and control actions), enabling the synthesis and interpretation of complex driving scenarios. In this paper, we conduct a survey about the application of foundation models for scenario generation and scenario analysis in autonomous driving (as of May 2025). Our survey presents a unified taxonomy that includes large language models, vision-language models, multimodal large language models, diffusion models, and world models for the generation and analysis of autonomous driving scenarios. In addition, we review the methodologies, open-source datasets, simulation platforms, and benchmark challenges, and we examine the evaluation metrics tailored explicitly to scenario generation and analysis. Finally, the survey concludes by highlighting the open challenges and research questions, and outlining promising future research directions. All reviewed papers are listed in a continuously maintained repository, which contains supplementary materials and is available at this https URL.

90

70

26 Aug 2025

computer-science computer-vision-and-pattern-recognition image-segmentation

SoccerNet 2025 Challenges Results

Chinese Academy of Sciences

University of Science and Technology of China

Shanghai Jiao Tong University

Nagoya University Institut de Robòtica i Informàtica Industrial

Aalborg University

EPFL

University of Tokyo

Huazhong University of Science and Technology Ulsan National Institute of Science and Technology Keio University Southeast University Beijing University of Posts and Telecommunications King Abdullah University of Science and Technology Universitat de Barcelona University of Tsukuba UCLouvain Michigan Technological University University of Liège University of Science and Technology Korea Institute of Science and Technology University of the Bundeswehr Munich Shenzhen Institutes of Advanced Technology Max-Planck Institute for Informatics Computer Vision Center Universidad Industrial de Santander Suzhou Institute for Advanced Research State Key Laboratory of Networking and Switching Technology Leipzig University of Applied Sciences Escuela Superior Politecnica del Litoral EVS Broadcast Equipment Intellindust AI Lab Opus AI Research Sportradar TAHAKOM Eidos.ai KIST School int8.io MIXI Inc.Intelligent Perception and Image Understanding Lab Playbox Inc.Laboratory for Biosignal Processing

Silvio Giancola

The SoccerNet 2025 Challenges mark the fifth annual edition of the SoccerNet open benchmarking effort, dedicated to advancing computer vision research in football video understanding. This year's challenges span four vision-based tasks: (1) Team Ball Action Spotting, focused on detecting ball-related actions in football broadcasts and assigning actions to teams; (2) Monocular Depth Estimation, targeting the recovery of scene geometry from single-camera broadcast clips through relative depth estimation for each pixel; (3) Multi-View Foul Recognition, requiring the analysis of multiple synchronized camera views to classify fouls and their severity; and (4) Game State Reconstruction, aimed at localizing and identifying all players from a broadcast video to reconstruct the game state on a 2D top-view of the field. Across all tasks, participants were provided with large-scale annotated datasets, unified evaluation protocols, and strong baselines as starting points. This report presents the results of each challenge, highlights the top-performing solutions, and provides insights into the progress made by the community. The SoccerNet Challenges continue to serve as a driving force for reproducible, open research at the intersection of computer vision, artificial intelligence, and sports. Detailed information about the tasks, challenges, and leaderboards can be found at this https URL, with baselines and development kits available at this https URL.

95

01 Sep 2023

computer-science computer-vision-security computer-vision-and-pattern-recognition

dacl10k: Benchmark for Semantic Bridge Damage Segmentation

University of the Bundeswehr Munich

Reliably identifying reinforced concrete defects (RCDs)plays a crucial role in assessing the structural integrity, traffic safety, and long-term durability of concrete bridges, which represent the most common bridge type worldwide. Nevertheless, available datasets for the recognition of RCDs are small in terms of size and class variety, which questions their usability in real-world scenarios and their role as a benchmark. Our contribution to this problem is "dacl10k", an exceptionally diverse RCD dataset for multi-label semantic segmentation comprising 9,920 images deriving from real-world bridge inspections. dacl10k distinguishes 12 damage classes as well as 6 bridge components that play a key role in the building assessment and recommending actions, such as restoration works, traffic load limitations or bridge closures. In addition, we examine baseline models for dacl10k which are subsequently evaluated. The best model achieves a mean intersection-over-union of 0.42 on the test set. dacl10k, along with our baselines, will be openly accessible to researchers and practitioners, representing the currently biggest dataset regarding number of images and class diversity for semantic segmentation in the bridge inspection domain.

1

432

29 Aug 2023

computer-science computer-vision-security contrastive-learning

Sample4Geo: Hard Negative Sampling For Cross-View Geo-Localisation

University of the Bundeswehr Munich

Sample4Geo, from the University of the Bundeswehr Munich, introduces a streamlined cross-view geo-localization framework that achieves state-of-the-art performance across multiple benchmarks by leveraging a weight-shared ConvNeXt CNN and novel hard negative sampling strategies. The method significantly improves generalization to unseen geographical areas while avoiding complex pre-processing or multi-encoder architectures.

936

17 Nov 2024

mathematics optimization-and-control physics

Challenges and Opportunities in Quantum Optimization

George Korpas

Recent advances in quantum computers are demonstrating the ability to solve problems at a scale beyond brute force classical simulation. As such, a widespread interest in quantum algorithms has developed in many areas, with optimization being one of the most pronounced domains. Across computer science and physics, there are a number of different approaches for major classes of optimization problems, such as combinatorial optimization, convex optimization, non-convex optimization, and stochastic extensions. This work draws on multiple approaches to study quantum optimization. Provably exact versus heuristic settings are first explained using computational complexity theory - highlighting where quantum advantage is possible in each context. Then, the core building blocks for quantum optimization algorithms are outlined to subsequently define prominent problem classes and identify key open questions that, if answered, will advance the field. The effects of scaling relevant problems on noisy quantum devices are also outlined in detail, alongside meaningful benchmarking problems. We underscore the importance of benchmarking by proposing clear metrics to conduct appropriate comparisons with classical optimization techniques. Lastly, we highlight two domains - finance and sustainability - as rich sources of optimization problems that could be used to benchmark, and eventually validate, the potential real-world impact of quantum optimization.

98

12 Sep 2023

computer-science computer-vision-security artificial-intelligence

SoccerNet 2023 Challenges Results

sh ding

·

Silvio Giancola

The third annual SoccerNet challenges report details the outcomes of seven distinct computer vision tasks focused on soccer video understanding, establishing new state-of-the-art benchmarks in areas like temporal localization, scene understanding, and player analysis. Participants achieved quantitative improvements across all tasks, often leveraging pre-trained models and sophisticated multi-stage approaches.

89

16 Sep 2024

computer-science computer-vision-security computer-vision-and-pattern-recognition

SoccerNet 2024 Challenges Results

Carlos Hinojosa

·

Silvio Giancola

The SoccerNet 2024 challenges represent the fourth annual video understanding challenges organized by the SoccerNet team. These challenges aim to advance research across multiple themes in football, including broadcast video understanding, field understanding, and player understanding. This year, the challenges encompass four vision-based tasks. (1) Ball Action Spotting, focusing on precisely localizing when and which soccer actions related to the ball occur, (2) Dense Video Captioning, focusing on describing the broadcast with natural language and anchored timestamps, (3) Multi-View Foul Recognition, a novel task focusing on analyzing multiple viewpoints of a potential foul incident to classify whether a foul occurred and assess its severity, (4) Game State Reconstruction, another novel task focusing on reconstructing the game state from broadcast videos onto a 2D top-view map of the field. Detailed information about the tasks, challenges, and leaderboards can be found at this https URL, with baselines and development kits available at this https URL.

15

10 Nov 2025

autonomous-vehicles computer-science machine-learning

Dynamics-Decoupled Trajectory Alignment for Sim-to-Real Transfer in Reinforcement Learning for Autonomous Driving

University of the Bundeswehr Munich

Reinforcement learning (RL) has shown promise in robotics, but deploying RL on real vehicles remains challenging due to the complexity of vehicle dynamics and the mismatch between simulation and reality. Factors such as tire characteristics, road surface conditions, aerodynamic disturbances, and vehicle load make it infeasible to model real-world dynamics accurately, which hinders direct transfer of RL agents trained in simulation. In this paper, we present a framework that decouples motion planning from vehicle control through a spatial and temporal alignment strategy between a virtual vehicle and the real system. An RL agent is first trained in simulation using a kinematic bicycle model to output continuous control actions. Its behavior is then distilled into a trajectory-predicting agent that generates finite-horizon ego-vehicle trajectories, enabling synchronization between virtual and real vehicles. At deployment, a Stanley controller governs lateral dynamics, while longitudinal alignment is maintained through adaptive update mechanisms that compensate for deviations between virtual and real trajectories. We validate our approach on a real vehicle and demonstrate that the proposed alignment strategy enables robust zero-shot transfer of RL-based motion planning from simulation to reality, successfully decoupling high-level trajectory generation from low-level vehicle control.

105

20 Aug 2024

computer-science computer-vision-and-pattern-recognition multi-modal-learning

SenPa-MAE: Sensor Parameter Aware Masked Autoencoder for Multi-Satellite Self-Supervised Pretraining

University of the Bundeswehr Munich

This paper introduces SenPa-MAE, a transformer architecture that encodes the sensor parameters of an observed multispectral signal into the image embeddings. SenPa-MAE can be pre-trained on imagery of different satellites with non-matching spectral or geometrical sensor characteristics. To incorporate sensor parameters, we propose a versatile sensor parameter encoding module as well as a data augmentation strategy for the diversification of the pre-training dataset. This enables the model to effectively differentiate between various sensors and gain an understanding of sensor parameters and the correlation to the observed signal. Given the rising number of Earth observation satellite missions and the diversity in their sensor specifications, our approach paves the way towards a sensor-independent Earth observation foundation model. This opens up possibilities such as cross-sensor training and sensor-independent inference.

40

11 Apr 2025

computer-science computer-vision-security computer-vision-and-pattern-recognition

SARFormer -- An Acquisition Parameter Aware Vision Transformer for Synthetic Aperture Radar Data

University of the Bundeswehr Munich

SARFormer, a Vision Transformer designed for Synthetic Aperture Radar (SAR) data, explicitly incorporates acquisition parameters and uses self-supervised pre-training to enhance its ability to interpret SAR imagery. This model achieved an RMSE of 3.8m for building height reconstruction and an mIoU of 0.67 for building footprint segmentation, showing improved performance over baselines in handling diverse SAR acquisition conditions.

49

22 Aug 2025

computer-science performance

Automotive Middleware Performance: Comparison of FastDDS, Zenoh and vSomeIP

University of the Bundeswehr Munich RWTH Aachen University

In this study, we evaluate the performance of current automotive communication middlewares under various operating conditions. Specifically, we examine FastDDS, a widely used open-source middleware, the newly developed Zenoh middleware, and vSomeIP, COVESAs open-source implementation of SOME/IP. Our objective is to identify the best performing middleware for specific operating conditions. To ensure accessibility, we first provide a concise overview of middleware technologies and their fundamental principles. We then introduce our testing methodology designed to systematically assess middleware performance metrics such as scaling performance, end-to-end latency, and discovery times across multiple message types, network topologies, and configurations. Finally, we compare the resulting performance data and present our results in nine findings. Our evaluation code and the resulting data will be made publicly available upon acceptance.

1

42

07 Sep 2023

computer-science computer-vision-security computer-vision-and-pattern-recognition

dacl1k: Real-World Bridge Damage Dataset Putting Open-Source Data to the Test

University of the Bundeswehr Munich

Recognising reinforced concrete defects (RCDs) is a crucial element for determining the structural integrity, traffic safety and durability of bridges. However, most of the existing datasets in the RCD domain are derived from a small number of bridges acquired in specific camera poses, lighting conditions and with fixed hardware. These limitations question the usability of models trained on such open-source data in real-world scenarios. We address this problem by testing such models on our "dacl1k" dataset, a highly diverse RCD dataset for multi-label classification based on building inspections including 1,474 images. Thereby, we trained the models on different combinations of open-source data (meta datasets) which were subsequently evaluated both extrinsically and intrinsically. During extrinsic evaluation, we report metrics on dacl1k and the meta datasets. The performance analysis on dacl1k shows practical usability of the meta data, where the best model shows an Exact Match Ratio of 32%. Additionally, we conduct an intrinsic evaluation by clustering the bottleneck features of the best model derived from the extrinsic evaluation in order to find out, if the model has learned distinguishing datasets or the classes (RCDs) which is the aspired goal. The dacl1k dataset and our trained models will be made publicly available, enabling researchers and practitioners to put their models to the real-world test.

33

17 Dec 2024

computer-science computation-and-language

Unlocking LLMs: Addressing Scarce Data and Bias Challenges in Mental Health

University of Cagliari University of the Bundeswehr Munich Barkatullah University

Large language models (LLMs) have shown promising capabilities in healthcare analysis but face several challenges like hallucinations, parroting, and bias manifestation. These challenges are exacerbated in complex, sensitive, and low-resource domains. Therefore, in this work we introduce IC-AnnoMI, an expert-annotated motivational interviewing (MI) dataset built upon AnnoMI by generating in-context conversational dialogues leveraging LLMs, particularly ChatGPT. IC-AnnoMI employs targeted prompts accurately engineered through cues and tailored information, taking into account therapy style (empathy, reflection), contextual relevance, and false semantic change. Subsequently, the dialogues are annotated by experts, strictly adhering to the Motivational Interviewing Skills Code (MISC), focusing on both the psychological and linguistic dimensions of MI dialogues. We comprehensively evaluate the IC-AnnoMI dataset and ChatGPT's emotional reasoning ability and understanding of domain intricacies by modeling novel classification tasks employing several classical machine learning and current state-of-the-art transformer approaches. Finally, we discuss the effects of progressive prompting strategies and the impact of augmented data in mitigating the biases manifested in IC-AnnoM. Our contributions provide the MI community with not only a comprehensive dataset but also valuable insights for using LLMs in empathetic text generation for conversational therapy in supervised settings.

14

26 Sep 2025

autonomous-vehicles computer-science computer-vision-and-pattern-recognition

Excavating in the Wild: The GOOSE-Ex Dataset for Semantic Segmentation

Karlsruhe Institute of Technology University of the Bundeswehr Munich Fraunhofer Institute of Optronics, System Technologies and Image Exploitation

The successful deployment of deep learning-based techniques for autonomous systems is highly dependent on the data availability for the respective system in its deployment environment. Especially for unstructured outdoor environments, very few datasets exist for even fewer robotic platforms and scenarios. In an earlier work, we presented the German Outdoor and Offroad Dataset (GOOSE) framework along with 10000 multimodal frames from an offroad vehicle to enhance the perception capabilities in unstructured environments. In this work, we address the generalizability of the GOOSE framework. To accomplish this, we open-source the GOOSE-Ex dataset, which contains additional 5000 labeled multimodal frames from various completely different environments, recorded on a robotic excavator and a quadruped platform. We perform a comprehensive analysis of the semantic segmentation performance on different platforms and sensor modalities in unseen environments. In addition, we demonstrate how the combined datasets can be utilized for different downstream applications or competitions such as offroad navigation, object manipulation or scene completion. The dataset, its platform documentation and pre-trained state-of-the-art models for offroad perception will be made available on this https URL. \

109

19 Feb 2025

agentic-frameworks agents computer-science

Human-Artificial Interaction in the Age of Agentic AI: A System-Theoretical Approach

Sapienza University of Rome University of the Bundeswehr Munich Università degli Studi del Molise

This research from the University of the Bundeswehr Munich, Sapienza University of Rome, and Università degli Studi del Molise proposes a system-theoretical framework for Human-Artificial Interaction (HAI), distinguishing between Multi-Agent Systems (MAS) and Centaurian (human-AI fusion) paradigms. It leverages Colored Petri nets to formalize "communication spaces" across surface, observation, and computation layers, enabling the design and analysis of complex human-AI collaborations demonstrated through robotic control and Large Action Model use cases.

38

10 Jun 2022

attention-mechanisms computer-science computation-and-language

Less Is More: Linear Layers on CLIP Features as Powerful VizWiz Model

University of the Bundeswehr Munich

Current architectures for multi-modality tasks such as visual question answering suffer from their high complexity. As a result, these architectures are difficult to train and require high computational resources. To address these problems we present a CLIP-based architecture that does not require any fine-tuning of the feature extractors. A simple linear classifier is used on the concatenated features of the image and text encoder. During training an auxiliary loss is added which operates on the answer types. The resulting classification is then used as an attention gate on the answer class selection. On the VizWiz 2022 Visual Question Answering Challenge we achieve 60.15 % accuracy on Task 1: Predict Answer to a Visual Question and AP score of 83.78 % on Task 2: Predict Answerability of a Visual Question.

14

23 May 2025

computer-science contrastive-learning computer-vision-and-pattern-recognition

Locality-Sensitive Hashing for Efficient Hard Negative Sampling in Contrastive Learning

Technical University of Munich University of the Bundeswehr Munich

Contrastive learning is a representational learning paradigm in which a neural network maps data elements to feature vectors. It improves the feature space by forming lots with an anchor and examples that are either positive or negative based on class similarity. Hard negative examples, which are close to the anchor in the feature space but from a different class, improve learning performance. Finding such examples of high quality efficiently in large, high-dimensional datasets is computationally challenging. In this paper, we propose a GPU-friendly Locality-Sensitive Hashing (LSH) scheme that quantizes real-valued feature vectors into binary representations for approximate nearest neighbor search. We investigate its theoretical properties and evaluate it on several datasets from textual and visual domain. Our approach achieves comparable or better performance while requiring significantly less computation than existing hard negative mining strategies.

17

11 Aug 2025

computer-science machine-learning

sbi reloaded: a toolkit for simulation-based inference workflows

Scientists and engineers use simulators to model empirically observed phenomena. However, tuning the parameters of a simulator to ensure its outputs match observed data presents a significant challenge. Simulation-based inference (SBI) addresses this by enabling Bayesian inference for simulators, identifying parameters that match observed data and align with prior knowledge. Unlike traditional Bayesian inference, SBI only needs access to simulations from the model and does not require evaluations of the likelihood function. In addition, SBI algorithms do not require gradients through the simulator, allow for massive parallelization of simulations, and can perform inference for different observations without further simulations or training, thereby amortizing inference. Over the past years, we have developed, maintained, and extended sbi, a PyTorch-based package that implements Bayesian SBI algorithms based on neural networks. The sbi toolkit implements a wide range of inference methods, neural network architectures, sampling methods, and diagnostic tools. In addition, it provides well-tested default settings, but also offers flexibility to fully customize every step of the simulation-based inference workflow. Taken together, the sbi toolkit enables scientists and engineers to apply state-of-the-art SBI methods to black-box simulators, opening up new possibilities for aligning simulations with empirically observed data.

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

The Use of Artificial Intelligence in Military Intelligence: An Experimental Investigation of Added Value in the Analysis Process

Lipolysis on Lipid Droplets: Mathematical Modelling and Numerical Discretisations

Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis

SoccerNet 2025 Challenges Results

dacl10k: Benchmark for Semantic Bridge Damage Segmentation

Sample4Geo: Hard Negative Sampling For Cross-View Geo-Localisation

Challenges and Opportunities in Quantum Optimization

SoccerNet 2023 Challenges Results

SoccerNet 2024 Challenges Results

Dynamics-Decoupled Trajectory Alignment for Sim-to-Real Transfer in Reinforcement Learning for Autonomous Driving

SenPa-MAE: Sensor Parameter Aware Masked Autoencoder for Multi-Satellite Self-Supervised Pretraining

SARFormer -- An Acquisition Parameter Aware Vision Transformer for Synthetic Aperture Radar Data

Automotive Middleware Performance: Comparison of FastDDS, Zenoh and vSomeIP

dacl1k: Real-World Bridge Damage Dataset Putting Open-Source Data to the Test

Unlocking LLMs: Addressing Scarce Data and Bias Challenges in Mental Health

Excavating in the Wild: The GOOSE-Ex Dataset for Semantic Segmentation

Human-Artificial Interaction in the Age of Agentic AI: A System-Theoretical Approach

Less Is More: Linear Layers on CLIP Features as Powerful VizWiz Model

Locality-Sensitive Hashing for Efficient Hard Negative Sampling in Contrastive Learning

sbi reloaded: a toolkit for simulation-based inference workflows

Events

AI for Law

Personalize Your Feed