alphaXiv

University POLITEHNICA Bucharest

29 Sep 2025

audio-and-speech-processing electrical-engineering

Unmasking real-world audio deepfakes: A data-centric approach

Fraunhofer AISEC University POLITEHNICA Bucharest Technical University of Cluj–Napoca

The growing prevalence of real-world deepfakes presents a critical challenge for existing detection systems, which are often evaluated on datasets collected just for scientific purposes. To address this gap, we introduce a novel dataset of real-world audio deepfakes. Our analysis reveals that these real-world examples pose significant challenges, even for the most performant detection models. Rather than increasing model complexity or exhaustively search for a better alternative, in this work we focus on a data-centric paradigm, employing strategies like dataset curation, pruning, and augmentation to improve model robustness and generalization. Through these methods, we achieve a 55% relative reduction in EER on the In-the-Wild dataset, reaching an absolute EER of 1.7%, and a 63% reduction on our newly proposed real-world deepfakes dataset, AI4T. These results highlight the transformative potential of data-centric approaches in enhancing deepfake detection for real-world applications. Code and data available at: this https URL.

29 Sep 2025

audio-and-speech-processing electrical-engineering

TADA: Training-free Attribution and Out-of-Domain Detection of Audio Deepfakes

University POLITEHNICA Bucharest Technical University of Cluj–Napoca

Deepfake detection has gained significant attention across audio, text, and image modalities, with high accuracy in distinguishing real from fake. However, identifying the exact source--such as the system or model behind a deepfake--remains a less studied problem. In this paper, we take a significant step forward in audio deepfake model attribution or source tracing by proposing a training-free, green AI approach based entirely on k-Nearest Neighbors (kNN). Leveraging a pre-trained self-supervised learning (SSL) model, we show that grouping samples from the same generator is straightforward--we obtain an 0.93 F1-score across five deepfake datasets. The method also demonstrates strong out-of-domain (OOD) detection, effectively identifying samples from unseen models at an F1-score of 0.84. We further analyse these results in a multi-dimensional approach and provide additional insights. All code and data protocols used in this work are available in our open repository: this https URL.

30 Mar 2025

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Unmasking real-world audio deepfakes: A data-centric approach

TADA: Training-free Attribution and Out-of-Domain Detection of Audio Deepfakes

A stochastic perturbed augmented Lagrangian method for smooth convex constrained minimization

Towards generalisable and calibrated synthetic speech detection with self-supervised representations

Parallel coordinate descent methods for composite minimization: convergence analysis and error bounds

DuQuad: an inexact (augmented) dual first order algorithm for quadratic programming

Stochastic subgradient for composite optimization with functional constraints

Complexity of a linearized augmented Lagrangian method for nonconvex minimization with nonlinear equality constraints

Moving higher-order Taylor approximations method for smooth constrained minimization problems

An accelerated randomized Bregman-Kaczmarz method for strongly convex linearly constraint optimization

On the Worst-Case Analysis of Cyclic Block Coordinate Descent type Algorithms

Exact representation and efficient approximations of linear model predictive control laws via HardTanh type deep neural networks

Mini-batch stochastic subgradient for functional constrained optimization

A Comparative Study of Compressive Sensing Algorithms for Hyperspectral Imaging Reconstruction

Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization

Continuous flows driving Markov processes and multiplicative LpL^pLp-semigroups

Random coordinate descent methods for nonseparable composite optimization

Efficiency of stochastic coordinate proximal gradient methods on nonseparable composite optimization

Coordinate projected gradient descent minimization and its application to orthogonal nonnegative matrix factorization

Modified projected Gauss-Newton method for constrained nonlinear least-squares: application to power flow analysis

Events

AI for Law

Personalize Your Feed

Continuous flows driving Markov processes and multiplicative $L^p$ -semigroups