alphaXiv

History

Papers Benchmarks

Graz University of Technology

524

20 Oct 2023

computer-science computation-and-language computer-vision-and-pattern-recognition

DataComp: In search of the next generation of multimodal datasets

University of Washington

University of Illinois at Urbana-Champaign UT Austin

Tel Aviv University

Columbia University

Apple Hebrew University Graz University of Technology AI2 LAION Juelich Supercomputing Center, Research Center Juelich

Alexandros Dimakis

DATACOMP introduces a benchmark and the 12.8 billion image-text pair COMMONPOOL dataset to systematically evaluate multimodal dataset design. A CLIP model trained on the resulting DATACOMP-1B dataset achieved 79.2% zero-shot ImageNet accuracy, outperforming models trained on larger, unfiltered datasets.

679

533

21 Jun 2024

computer-science computer-vision-security computer-vision-and-pattern-recognition

Taming 3DGS: High-Quality Radiance Fields with Limited Resources

Carnegie Mellon University International Institute of Information Technology, Hyderabad Graz University of Technology

TAMING3DGS introduces a budget-constrained optimization for 3D Gaussian Splatting, providing strict control over resource consumption and accelerating the training process. The method achieves 4-5x reductions in both model size and training time while maintaining or improving visual quality, making high-quality 3D scene reconstruction feasible on resource-constrained devices.

318

140

06 Jun 2025

autonomous-vehicles computer-science computer-vision-and-pattern-recognition

STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving

Graz University of Technology Johannes Kepler University Linz Christian Doppler Laboratory for Embedded Machine Learning

STSBench, a new spatio-temporal scenario benchmark, assesses multi-modal large language models' holistic understanding in autonomous driving. Evaluation on STSnu, an instantiation on NuScenes, reveals that current models significantly lack spatio-temporal reasoning for complex traffic dynamics.

02 Oct 2025

mathematics

Quantifying and testing dependence to categorical variables

Graz University of Technology

Daniel Strenger

Researchers at Graz University of Technology developed a novel dependence coefficient, "ψ", for categorical response variables and general covariates, ensuring invariance to category permutations and fully characterizing independence and functional dependence. Their method includes a statistically consistent estimator and an independence test with a pivotal chi-squared asymptotic distribution, applicable to high-dimensional data without resampling.

182

12 Aug 2025

computer-science artificial-intelligence machine-learning

Forget the Data and Fine-Tuning! Just Fold the Network to Compress

ETH Zurich Complexity Science Hub Vienna Graz University of Technology

Dong Wang

Researchers from Graz University of Technology, Complexity Science Hub Vienna, and ETH Zurich developed "model folding," a data-free and fine-tuning-free compression technique that reduces neural network size by merging structurally similar neurons across layers while preserving internal data statistics. This method consistently outperforms other data-free baselines and traditional pruning at high sparsity levels across CNNs and LLaMA-7B, achieving significant compression and efficiency without compromising performance.

211

09 Oct 2024

computer-science computer-vision-security computer-vision-and-pattern-recognition

StopThePop: Sorted Gaussian Splatting for View-Consistent Real-time Rendering

Huawei Technologies TU Wien Graz University of Technology

This paper introduces "StopThePop," a refined rendering pipeline for 3D Gaussian Splatting that eliminates visual popping artifacts and view inconsistencies caused by approximate sorting. It achieves this by employing a novel hierarchical rasterization approach that maintains comparable image quality and near real-time performance, being only 4% slower than original 3DGS on average, and up to 1.6x faster with opacity decay.

191

24 Sep 2025

autonomous-vehicles computer-science computer-vision-and-pattern-recognition

Investigating Traffic Accident Detection Using Multimodal Large Language Models

Graz University of Technology Virtual Vehicle Research GmbH

Traffic safety remains a critical global concern, with timely and accurate accident detection essential for hazard reduction and rapid emergency response. Infrastructure-based vision sensors offer scalable and efficient solutions for continuous real-time monitoring, facilitating automated detection of accidents directly from captured images. This research investigates the zero-shot capabilities of multimodal large language models (MLLMs) for detecting and describing traffic accidents using images from infrastructure cameras, thus minimizing reliance on extensive labeled datasets. Main contributions include: (1) Evaluation of MLLMs using the simulated DeepAccident dataset from CARLA, explicitly addressing the scarcity of diverse, realistic, infrastructure-based accident data through controlled simulations; (2) Comparative performance analysis between Gemini 1.5 and 2.0, Gemma 3 and Pixtral models in accident identification and descriptive capabilities without prior fine-tuning; and (3) Integration of advanced visual analytics, specifically YOLO for object detection, Deep SORT for multi-object tracking, and Segment Anything (SAM) for instance segmentation, into enhanced prompts to improve model accuracy and explainability. Key numerical results show Pixtral as the top performer with an F1-score of 0.71 and 83% recall, while Gemini models gained precision with enhanced prompts (e.g., Gemini 1.5 rose to 90%) but suffered notable F1 and recall losses. Gemma 3 offered the most balanced performance with minimal metric fluctuation. These findings demonstrate the substantial potential of integrating MLLMs with advanced visual analytics techniques, enhancing their applicability in real-world automated traffic monitoring systems.

06 Apr 2022

computer-science computer-vision-and-pattern-recognition

PolyWorld: Polygonal Building Extraction with Graph Neural Networks in Satellite Images

Graz University of Technology Blackshark.ai VRVis

While most state-of-the-art instance segmentation methods produce binary segmentation masks, geographic and cartographic applications typically require precise vector polygons of extracted objects instead of rasterized output. This paper introduces PolyWorld, a neural network that directly extracts building vertices from an image and connects them correctly to create precise polygons. The model predicts the connection strength between each pair of vertices using a graph neural network and estimates the assignments by solving a differentiable optimal transport problem. Moreover, the vertex positions are optimized by minimizing a combined segmentation and polygonal angle difference loss. PolyWorld significantly outperforms the state of the art in building polygonization and achieves not only notable quantitative results, but also produces visually pleasing building polygons. Code and trained weights are publicly available at this https URL.

197

06 Nov 2025

computer-science graphics machine-learning

A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External Memory

University of Stuttgart

Huawei Graz University of Technology

A LoD of Gaussians introduces a unified training and rendering framework for ultra-large-scale 3D Gaussian Splatting scenes, leveraging external memory and a dynamic Level-of-Detail system. The method enables artifact-free reconstruction of city-scale environments on a single consumer-grade GPU while achieving higher quality and faster convergence than prior chunk-based approaches.

06 Oct 2025

attention-mechanisms computer-science computer-vision-and-pattern-recognition

Attention-Enhanced Prototypical Learning for Few-Shot Infrastructure Defect Segmentation

University of New Orleans Graz University of Technology The University of New Orleans U.S. Army Corps of Engineers, Engineer Research and Development Center Canizaro Livingston Gulf States Center for Environmental Informatics

Few-shot semantic segmentation is vital for deep learning-based infrastructure inspection applications, where labeled training examples are scarce and expensive. Although existing deep learning frameworks perform well, the need for extensive labeled datasets and the inability to learn new defect categories with little data are problematic. We present our Enhanced Feature Pyramid Network (E-FPN) framework for few-shot semantic segmentation of culvert and sewer defect categories using a prototypical learning framework. Our approach has three main contributions: (1) adaptive E-FPN encoder using InceptionSepConv blocks and depth-wise separable convolutions for efficient multi-scale feature extraction; (2) prototypical learning with masked average pooling for powerful prototype generation from small support examples; and (3) attention-based feature representation through global self-attention, local self-attention and cross-attention. Comprehensive experimentation on challenging infrastructure inspection datasets illustrates that the method achieves excellent few-shot performance, with the best configuration being 8-way 5-shot training configuration at 82.55% F1-score and 72.26% mIoU in 2-way classification testing. The self-attention method had the most significant performance improvements, providing 2.57% F1-score and 2.9% mIoU gain over baselines. Our framework addresses the critical need to rapidly respond to new defect types in infrastructure inspection systems with limited new training data that lead to more efficient and economical maintenance plans for critical infrastructure systems.

03 Sep 2017

computer-science artificial-intelligence machine-learning

Safe Reinforcement Learning via Shielding

University of Texas at Austin University of Bremen Graz University of Technology DFKI GmbH

This research introduces a framework that integrates formal methods with reinforcement learning through a reactive “shield” to ensure provable safety while optimizing performance. The approach successfully enforces temporal logic safety specifications across various environments and often accelerates learning convergence by preventing unsafe exploration.

120

05 Feb 2025

computer-science computer-vision-and-pattern-recognition generative-models

FlowSDF: Flow Matching for Medical Image Segmentation Using Distance Transforms

ETH Zurich Graz University of Technology

Medical image segmentation plays an important role in accurately identifying and isolating regions of interest within medical images. Generative approaches are particularly effective in modeling the statistical properties of segmentation masks that are closely related to the respective structures. In this work we introduce FlowSDF, an image-guided conditional flow matching framework, designed to represent the signed distance function (SDF), and, in turn, to represent an implicit distribution of segmentation masks. The advantage of leveraging the SDF is a more natural distortion when compared to that of binary masks. Through the learning of a vector field associated with the probability path of conditional SDF distributions, our framework enables accurate sampling of segmentation masks and the computation of relevant statistical measures. This probabilistic approach also facilitates the generation of uncertainty maps represented by the variance, thereby supporting enhanced robustness in prediction and further analysis. We qualitatively and quantitatively illustrate competitive performance of the proposed method on a public nuclei and gland segmentation data set, highlighting its utility in medical image segmentation applications.

27 Mar 2019

combinatorics mathematics

The genus of the Erdős-Rényi random graph and the fragile genus property

Tel Aviv University Graz University of Technology

We investigate the genus

g(n,m)

of the Erdős-Rényi random graph

G(n,m)

, providing a thorough description of how this relates to the function

m=m(n)

, and finding that there is different behaviour depending on which `region'

m

falls into. Results already exist for

m \le \frac{n}{2} + O(n^{2/3})

and

m = \omega \left( n^{1+\frac{1}{j}} \right)

for

j \in \mathbb{N}

, and so we focus on the intermediate cases. We establish that

g(n,m) = (1+o(1)) \frac{m}{2}

whp (with high probability) when

n \ll m = n^{1+o(1)}

, that

g(n,m) = (1+o(1)) \mu (\lambda) m

whp for a given function

\mu (\lambda)

when

m \sim \lambda n

for

\lambda > \frac{1}{2}

, and that

g(n,m) = (1+o(1)) \frac{8s^{3}}{3n^{2}}

whp when

m = \frac{n}{2} + s

for

n^{2/3} \ll s \ll n

. We then also show that the genus of a fixed graph can increase dramatically if a small number of random edges are added. Given any connected graph with bounded maximum degree, we find that the addition of

\epsilon n

edges will whp result in a graph with genus

\Omega (n)

, even when

\epsilon

is an arbitrarily small constant! We thus call this the `fragile genus' property.

116

25 Sep 2024

autonomous-vehicles computer-science computer-vision-and-pattern-recognition

Efficient Motion Prediction: A Lightweight & Accurate Trajectory Prediction Model With Fast Training and Inference Speed

Graz University of Technology

The Efficient Motion Prediction (EMP) model by Graz University of Technology achieves accuracy on par with state-of-the-art transformer-based models on Argoverse 2 while training in under 13 hours on a single GPU and demonstrating significantly lower inference latency. This work provides a highly efficient alternative for motion forecasting, reducing computational requirements for both development and deployment.

15 Sep 2025

computer-science machine-learning optimization-methods

Stabilizing PINNs: A regularization scheme for PINN training to avoid unstable fixed points of dynamical systems

Graz University of Technology Know Center Research GmbH Christian Doppler Laboratory for Physics-driven Machine Learning in Industrial Applications

It was recently shown that the loss function used for training physics-informed neural networks (PINNs) exhibits local minima at solutions corresponding to fixed points of dynamical systems. In the forward setting, where the PINN is trained to solve initial value problems, these local minima can interfere with training and potentially leading to physically incorrect solutions. Building on stability theory, this paper proposes a regularization scheme that penalizes solutions corresponding to unstable fixed points. Experimental results on four dynamical systems, including the Lotka-Volterra model and the van der Pol oscillator, show that our scheme helps avoiding physically incorrect solutions and substantially improves the training success rate of PINNs.

119

12 Dec 2023

ai-for-health computer-science computer-vision-and-pattern-recognition

MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision

University Hospital Essen Graz University of Technology Computer Algorithms for Medicine Laboratory

MedShapeNet introduces a large-scale, community-formed dataset of over 100,000 3D medical shapes, derived from real patient imaging data, to bridge the gap between general 3D computer vision advancements and medical applications. The dataset provides standardized 3D anatomical models and surgical instruments, enabling the development and application of deep learning algorithms for tasks such as tumor classification, shape reconstruction, and extended reality medical applications.

08 Oct 2025

astrophysics-of-galaxies physics

The $M_{\rm BH}-M_{*}$ Relationship at $3

University of Michigan University of Edinburgh

NASA Goddard Space Flight Center

University of Texas at Austin

Space Telescope Science Institute

Johns Hopkins University Kavli Institute for Astronomy and Astrophysics, Peking University The George Washington University The Hebrew University Rochester Institute of Technology University of Massachusetts Amherst Tufts University University of Connecticut University of Sussex Universit`a di Bologna Graz University of Technology University of California Riverside UK Astronomy Technology Centre INAF - Osservatorio di Astrofisica e Scienza dello Spazio Millennium Institute of Astrophysics Center for Computational Astrophysics, Flatiron Institute Colby College Catholic University of America NSF’s National Optical-Infrared Astronomy Research Laboratory Pontificia Universidad Catolica de Chile Eureka Scientific Inc.Kavli Institute for the Physics and Mathematics of the Universe (WPI), University of Tokyo Black Hole Initiative, Harvard University Universita’ ”La Sapienza”Max Planck-Institute for Extraterrestrial Physics INAF Osservatorio Astronomico di Padova Center for Astrophysics Harvard & Smithsonian Universita' di Padova

JWST has identified a large population of faint, broad-line active galactic nuclei (AGN) in the early universe that are powered by black holes (BHs) that often appear overmassive relative to their host galaxies. In this study, we examine the relationship between BH mass and galaxy stellar mass at

33\sigma

above the relationship measured for local broad-line AGN. We derive an intrinsic scatter in this relationship of

0.9

dex, which does not vary over the redshift range of our sample. We also find that the

M_{\rm BH}/M_{\star}

ratio increases by

2.3

dex from

z = 3.5

and

z = 6.5

with a confidence level of

&gt; 3\sigma

. We attribute this trend with the increasing fraction of LRDs in our sample at

z&gt;4

as their host masses are

\sim1

dex lower than the non-LRD AGN in our sample. These results support a picture in which the BHs powering JWST's broad-line AGN are genuinely overmassive and become increasingly so with redshift. We discuss the implications of our findings on early BH growth relative to that of their host galaxies and the constraints it places on BH seeding models.

29 Sep 2025

computer-science graphics

NeuralPVS: Learned Estimation of Potentially Visible Sets

University of Stuttgart Graz University of Technology

Real-time visibility determination in expansive or dynamically changing environments has long posed a significant challenge in computer graphics. Existing techniques are computationally expensive and often applied as a precomputation step on a static scene. We present NeuralPVS, the first deep-learning approach for visibility computation that efficiently determines from-region visibility in a large scene, running at approximately 100 Hz processing with less than

1\%

missing geometry. This approach is possible by using a neural network operating on a voxelized representation of the scene. The network's performance is achieved by combining sparse convolution with a 3D volume-preserving interleaving for data compression. Moreover, we introduce a novel repulsive visibility loss that can effectively guide the network to converge to the correct data distribution. This loss provides enhanced robustness and generalization to unseen scenes. Our results demonstrate that NeuralPVS outperforms existing methods in terms of both accuracy and efficiency, making it a promising solution for real-time visibility computation.

03 Apr 2025

computer-science computer-vision-and-pattern-recognition generative-models

Diffusion at Absolute Zero: Langevin Sampling Using Successive Moreau Envelopes [conference paper]

Graz University of Technology

In this article we propose a novel method for sampling from Gibbs distributions of the form

\pi(x)\propto\exp(-U(x))

with a potential

U(x)

. In particular, inspired by diffusion models we propose to consider a sequence

(\pi^{t_k})_k

of approximations of the target density, for which

\pi^{t_k}\approx \pi

for

k

small and, on the other hand,

\pi^{t_k}

exhibits favorable properties for sampling for

k

large. This sequence is obtained by replacing parts of the potential

U

by its Moreau envelopes. Sampling is performed in an Annealed Langevin type procedure, that is, sequentially sampling from

\pi^{t_k}

for decreasing

k

, effectively guiding the samples from a simple starting density to the more complex target. In addition to a theoretical analysis we show experimental results supporting the efficacy of the method in terms of increased convergence speed and applicability to multi-modal densities

\pi

09 Mar 2023

clustering-algorithms computer-science computer-vision-and-pattern-recognition

TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and Clustering

Graz University of Technology Max-Planck Institute for Informatics Christian Doppler Laboratory for Semantic 3D Computer Vision Goethe-University, Frankfurt

Temporal action segmentation in untrimmed videos has gained increased attention recently. However, annotating action classes and frame-wise boundaries is extremely time consuming and cost intensive, especially on large-scale datasets. To address this issue, we propose an unsupervised approach for learning action classes from untrimmed video sequences. In particular, we propose a temporal embedding network that combines relative time prediction, feature reconstruction, and sequence-to-sequence learning, to preserve the spatial layout and sequential nature of the video features. A two-step clustering pipeline on these embedded feature representations then allows us to enforce temporal consistency within, as well as across videos. Based on the identified clusters, we decode the video into coherent temporal segments that correspond to semantically meaningful action classes. Our evaluation on three challenging datasets shows the impact of each component and, furthermore, demonstrates our state-of-the-art unsupervised action segmentation results.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

DataComp: In search of the next generation of multimodal datasets

Taming 3DGS: High-Quality Radiance Fields with Limited Resources

STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving

Quantifying and testing dependence to categorical variables

Forget the Data and Fine-Tuning! Just Fold the Network to Compress

StopThePop: Sorted Gaussian Splatting for View-Consistent Real-time Rendering

Investigating Traffic Accident Detection Using Multimodal Large Language Models

PolyWorld: Polygonal Building Extraction with Graph Neural Networks in Satellite Images

A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External Memory

Attention-Enhanced Prototypical Learning for Few-Shot Infrastructure Defect Segmentation

Safe Reinforcement Learning via Shielding

FlowSDF: Flow Matching for Medical Image Segmentation Using Distance Transforms

The genus of the Erdős-Rényi random graph and the fragile genus property

Efficient Motion Prediction: A Lightweight & Accurate Trajectory Prediction Model With Fast Training and Inference Speed

Stabilizing PINNs: A regularization scheme for PINN training to avoid unstable fixed points of dynamical systems

MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision

The $M_{\rm BH}-M_{*}$ Relationship at $3

NeuralPVS: Learned Estimation of Potentially Visible Sets

Diffusion at Absolute Zero: Langevin Sampling Using Successive Moreau Envelopes [conference paper]

TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and Clustering

Events

AI for Law

Personalize Your Feed

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

DataComp: In search of the next generation of multimodal datasets

Taming 3DGS: High-Quality Radiance Fields with Limited Resources

STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving

Quantifying and testing dependence to categorical variables

Forget the Data and Fine-Tuning! Just Fold the Network to Compress

StopThePop: Sorted Gaussian Splatting for View-Consistent Real-time Rendering

Investigating Traffic Accident Detection Using Multimodal Large Language Models

PolyWorld: Polygonal Building Extraction with Graph Neural Networks in Satellite Images

A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External Memory

Attention-Enhanced Prototypical Learning for Few-Shot Infrastructure Defect Segmentation

Safe Reinforcement Learning via Shielding

FlowSDF: Flow Matching for Medical Image Segmentation Using Distance Transforms

The genus of the Erdős-Rényi random graph and the fragile genus property

Efficient Motion Prediction: A Lightweight & Accurate Trajectory Prediction Model With Fast Training and Inference Speed

Stabilizing PINNs: A regularization scheme for PINN training to avoid unstable fixed points of dynamical systems

MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision

The MBH−M∗M_{\rm BH}-M_{*}MBH​−M∗​ Relationship at $3

NeuralPVS: Learned Estimation of Potentially Visible Sets

Diffusion at Absolute Zero: Langevin Sampling Using Successive Moreau Envelopes [conference paper]

TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and Clustering

Events

AI for Law

Personalize Your Feed

The $M_{\rm BH}-M_{*}$ Relationship at $3