alphaXiv

History

Papers Benchmarks

Washington State University

478

07 Sep 2021

ai-for-health computer-science computer-vision-security

Large-scale Robust Deep AUC Maximization: A New Surrogate Loss and Empirical Studies on Medical Image Classification

Washington State University The University of Iowa

A robust Deep AUC Maximization (DAM) method employing a novel AUC min-max-margin loss was developed, securing 1st place in the CheXpert competition and a top 1% rank in the Kaggle 2020 Melanoma classification, demonstrating superior performance over conventional losses and prior AUC maximization techniques on large-scale, imbalanced medical image datasets.

312

232

17 Dec 2019

computer-science artificial-intelligence machine-learning

Improved Knowledge Distillation via Teacher Assistant

Google DeepMind Washington State University

Teacher Assistant Knowledge Distillation (TAKD) is proposed to improve knowledge transfer when a large teacher model distills knowledge to a much smaller student network. It introduces intermediate-sized Teacher Assistant networks, often yielding better student accuracy by bridging the capacity gap and leading to flatter loss landscapes.

1,365

13 Apr 2025

autonomous-vehicles computer-science artificial-intelligence

From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future

University of Chicago The University of Sydney University of Wollongong Washington State University

A survey distinguishes between standalone Large Language Models (LLMs) and LLM-based agents in software engineering (SE), analyzing their applications across six domains including code generation and security. It details how agents extend LLMs with autonomy, tool integration, and complex decision-making, while also identifying prevalent models, benchmarks, and evaluation metrics used in the field.

108

03 Mar 2023

bayesian-optimization computer-science machine-learning

Bayesian Optimization over High-Dimensional Combinatorial Spaces via Dictionary-based Embeddings

Meta Washington State University

The paper introduces Bayesian Optimization with Dictionaries (BODi), a method leveraging Dictionary-based Embeddings (HED) to enable standard Gaussian Process models for efficient optimization over high-dimensional combinatorial and mixed input spaces. This approach enhances the accuracy of surrogate models and improves sample efficiency by transforming discrete inputs into an ordinal feature space.

23 Oct 2025

bayesian-optimization computer-science machine-learning

ROOT: Rethinking Offline Optimization as Distributional Translation via Probabilistic Bridge

National University of Singapore Hanoi University of Science and Technology National Institute of Advanced Industrial Science and Technology Washington State University

This paper studies the black-box optimization task which aims to find the maxima of a black-box function using a static set of its observed input-output pairs. This is often achieved via learning and optimizing a surrogate function with that offline data. Alternatively, it can also be framed as an inverse modeling task that maps a desired performance to potential input candidates that achieve it. Both approaches are constrained by the limited amount of offline data. To mitigate this limitation, we introduce a new perspective that casts offline optimization as a distributional translation task. This is formulated as learning a probabilistic bridge transforming an implicit distribution of low-value inputs (i.e., offline data) into another distribution of high-value inputs (i.e., solution candidates). Such probabilistic bridge can be learned using low- and high-value inputs sampled from synthetic functions that resemble the target function. These synthetic functions are constructed as the mean posterior of multiple Gaussian processes fitted with different parameterizations on the offline data, alleviating the data bottleneck. The proposed approach is evaluated on an extensive benchmark comprising most recent methods, demonstrating significant improvement and establishing a new state-of-the-art performance. Our code is publicly available at this https URL.

167

17 Sep 2020

computer-science artificial-intelligence machine-learning

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

University of Oxford

University of Texas at Austin Tufts University University of Leeds Washington State University

Researchers from UT Austin, Oxford, Leeds, Tufts, and Alberta established a unified conceptual framework for curriculum learning in reinforcement learning (RL), systematically classifying existing methods and identifying key research gaps. The work defines curriculum learning through task generation, sequencing, and transfer learning components, providing a structured overview to address challenges in RL sample efficiency and complex task acquisition.

11 Aug 2025

physics quantum-physics

Cold atomic ensembles as quantum antennas for distributed networks of single-atom arrays

University of Texas at Dallas Washington State University Washington University

Single neutral atoms in optical tweezer arrays offer a promising platform for high-fidelity quantum computing at local nodes. Nonetheless, creating entanglement between remote nodes in a distributed quantum network remains challenging due to inherently weak atom-light coupling. Here, we design a distributed quantum network architecture in which cold atomic ensembles with strong atom-light interactions act as quantum antennas, interfacing single-atom qubits with flying photons to enable high-efficiency atom-photon entanglement generation -- analogous to the role of antennas in classical communication. Using realistic experimental parameters, we estimate an efficiency of

\eta \simeq 0.548

for generating atom-photon entanglement, a probability of

P_{E} \simeq 6 \%

for generating atom-atom entanglement, and a remote entanglement generation rate of

16.6

kHz. This performance not only surpasses that of state-of-the-art cavity-based or high-numerical-aperture-lens-based architectures but also offers notable advantages in simplicity, tunability, and experimental accessibility. Our scheme also integrates a long-lived quantum memory, providing a storage advantage for quantum repeater design. By leveraging the complementary strengths of single-atom qubits for local operations and cold atomic ensembles for networking, this approach paves the way for scalable distributed quantum computing and sensing.

121

02 Apr 2025

computer-science cryptography-and-security software-engineering

APPATCH: Automated Adaptive Prompting Large Language Models for Real-World Software Vulnerability Patching

University at Buffalo Clemson University Washington State University

Timely and effective vulnerability patching is essential for cybersecurity defense, for which various approaches have been proposed yet still struggle to generate valid and correct patches for real-world vulnerabilities. In this paper, we leverage the power and merits of pre-trained language language models (LLMs) to enable automated vulnerability patching using no test input/exploit evidence and without model training/fine-tuning. To elicit LLMs to effectively reason about vulnerable code behaviors, which is essential for quality patch generation, we introduce vulnerability semantics reasoning and adaptive prompting on LLMs and instantiate the methodology as APPATCH, an automated LLM-based patching system. Our evaluation of APPATCH on 97 zero-day vulnerabilities and 20 existing vulnerabilities demonstrates its superior performance to both existing prompting methods and state-of-the-art non-LLM-based techniques (by up to 28.33% in F1 and 182.26% in recall over the best baseline). Through APPATCH, we demonstrate what helps for LLM-based patching and how, as well as discussing what still lacks and why.

437

27 Feb 2025

computer-science artificial-intelligence computers-and-society

Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization

ShanghaiTech University SenseTime Research Washington State University

Balancing helpfulness and safety (harmlessness) is a critical challenge in aligning large language models (LLMs). Current approaches often decouple these two objectives, training separate preference models for helpfulness and safety, while framing safety as a constraint within a constrained Markov Decision Process (CMDP) framework. This paper identifies a potential issue when using the widely adopted expected safety constraints for LLM safety alignment, termed "safety compensation", where the constraints are satisfied on expectation, but individual prompts may trade off safety, resulting in some responses being overly restrictive while others remain unsafe. To address this issue, we propose Rectified Policy Optimization (RePO), which replaces the expected safety constraint with critical safety constraints imposed on every prompt. At the core of RePO is a policy update mechanism driven by rectified policy gradients, which penalizes the strict safety violation of every prompt, thereby enhancing safety across nearly all prompts. Our experiments demonstrate that RePO outperforms strong baseline methods and significantly enhances LLM safety alignment.

500

07 Mar 2024

agent-based-systems cloud-computing computer-science

Exploring LLM-based Agents for Root Cause Analysis

Microsoft Washington State University

Researchers from Microsoft and Washington State University developed an LLM-based agent, leveraging the ReAct framework, to automate Root Cause Analysis in cloud incident management. This agent achieved competitive correctness rates for RCA while demonstrating a significantly lower hallucination rate of 7.27% compared to baseline LLM approaches.

18 Sep 2025

computer-science computational-engineering-finance-and-science

SPH-Net: A Co-Attention Hybrid Model for Accurate Stock Price Prediction

Nanyang Technological University Southern Methodist University Washington State University University of Nebraska-Lincoln Walnut Grove High School WuXpress Warehousing LLC

Prediction of stock price movements presents a formidable challenge in financial analytics due to the inherent volatility, non-stationarity, and nonlinear characteristics of market data. This paper introduces SPH-Net (Stock Price Prediction Hybrid Neural Network), an innovative deep learning framework designed to enhance the accuracy of time series forecasting in financial markets. The proposed architecture employs a novel co-attention mechanism that initially processes temporal patterns through a Vision Transformer, followed by refined feature extraction via an attention mechanism, thereby capturing both global and local dependencies in market data. To rigorously evaluate the model's performance, we conduct comprehensive experiments on eight diverse stock datasets: AMD, Ebay, Facebook, FirstService Corp, Tesla, Google, Mondi ADR, and Matador Resources. Each dataset is standardized using six fundamental market indicators: Open, High, Low, Close, Adjusted Close, and Volume, representing a complete set of features for comprehensive market analysis. Experimental results demonstrate that SPH-Net consistently outperforms existing stock prediction models across all evaluation metrics. The model's superior performance stems from its ability to effectively capture complex temporal patterns while maintaining robustness against market noise. By significantly improving prediction accuracy in financial time series analysis, SPH-Net provides valuable decision-support capabilities for investors and financial analysts, potentially enabling more informed investment strategies and risk assessment in volatile market conditions.

128

25 Feb 2025

computer-science computer-vision-security artificial-intelligence

Comprehensive Performance Evaluation of YOLOv12, YOLO11, YOLOv10, YOLOv9 and YOLOv8 on Detecting and Counting Fruitlet in Complex Orchard Environments

Washington State University Zhejiang Sci-Tech University

Ranjan Sapkota

This study systematically performed an extensive real-world evaluation of the performances of all configurations of YOLOv8, YOLOv9, YOLOv10, YOLO11( or YOLOv11), and YOLOv12 object detection algorithms in terms of precision, recall, mean Average Precision at 50\% Intersection over Union (mAP@50), and computational speeds including pre-processing, inference, and post-processing times immature green apple (or fruitlet) detection in commercial orchards. Additionally, this research performed and validated in-field counting of the fruitlets using an iPhone and machine vision sensors. Among the configurations, YOLOv12l recorded the highest recall rate at 0.90, compared to all other configurations of YOLO models. Likewise, YOLOv10x achieved the highest precision score of 0.908, while YOLOv9 Gelan-c attained a precision of 0.903. Analysis of mAP@0.50 revealed that YOLOv9 Gelan-base and YOLOv9 Gelan-e reached peak scores of 0.935, with YOLO11s and YOLOv12l following closely at 0.933 and 0.931, respectively. For counting validation using images captured with an iPhone 14 Pro, the YOLO11n configuration demonstrated outstanding accuracy, recording RMSE values of 4.51 for Honeycrisp, 4.59 for Cosmic Crisp, 4.83 for Scilate, and 4.96 for Scifresh; corresponding MAE values were 4.07, 3.98, 7.73, and 3.85. Similar performance trends were observed with RGB-D sensor data. Moreover, sensor-specific training on Intel Realsense data significantly enhanced model performance. YOLOv11n achieved highest inference speed of 2.4 ms, outperforming YOLOv8n (4.1 ms), YOLOv9 Gelan-s (11.5 ms), YOLOv10n (5.5 ms), and YOLOv12n (4.6 ms), underscoring its suitability for real-time object detection applications. (YOLOv12 architecture, YOLOv11 Architecture, YOLOv12 object detection, YOLOv11 object detecion, YOLOv12 segmentation)

06 Dec 2024

computer-science machine-learning statistical-learning

Conformal Prediction for Class-wise Coverage via Augmented Label Rank Calibration

Washington State University

Conformal prediction (CP) is an emerging uncertainty quantification framework that allows us to construct a prediction set to cover the true label with a pre-specified marginal or conditional probability. Although the valid coverage guarantee has been extensively studied for classification problems, CP often produces large prediction sets which may not be practically useful. This issue is exacerbated for the setting of class-conditional coverage on imbalanced classification tasks with many and/or imbalanced classes. This paper proposes the Rank Calibrated Class-conditional CP (RC3P) algorithm to reduce the prediction set sizes to achieve class-conditional coverage, where the valid coverage holds for each class. In contrast to the standard class-conditional CP (CCP) method that uniformly thresholds the class-wise conformity score for each class, the augmented label rank calibration step allows RC3P to selectively iterate this class-wise thresholding subroutine only for a subset of classes whose class-wise top-k error is small. We prove that agnostic to the classifier and data distribution, RC3P achieves class-wise coverage. We also show that RC3P reduces the size of prediction sets compared to the CCP method. Comprehensive experiments on multiple real-world datasets demonstrate that RC3P achieves class-wise coverage and 26.25% reduction in prediction set sizes on average.

108

19 Mar 2025

computer-science computer-vision-security computation-and-language

Improved YOLOv12 with LLM-Generated Synthetic Data for Enhanced Apple Detection and Benchmarking Against YOLOv11 and YOLOv10

Cornell University Washington State University

Ranjan Sapkota

This study evaluated the performance of the YOLOv12 object detection model, and compared against the performances YOLOv11 and YOLOv10 for apple detection in commercial orchards based on the model training completed entirely on synthetic images generated by Large Language Models (LLMs). The YOLOv12n configuration achieved the highest precision at 0.916, the highest recall at 0.969, and the highest mean Average Precision (mAP@50) at 0.978. In comparison, the YOLOv11 series was led by YOLO11x, which achieved the highest precision at 0.857, recall at 0.85, and mAP@50 at 0.91. For the YOLOv10 series, YOLOv10b and YOLOv10l both achieved the highest precision at 0.85, with YOLOv10n achieving the highest recall at 0.8 and mAP@50 at 0.89. These findings demonstrated that YOLOv12, when trained on realistic LLM-generated datasets surpassed its predecessors in key performance metrics. The technique also offered a cost-effective solution by reducing the need for extensive manual data collection in the agricultural field. In addition, this study compared the computational efficiency of all versions of YOLOv12, v11 and v10, where YOLOv11n reported the lowest inference time at 4.7 ms, compared to YOLOv12n's 5.6 ms and YOLOv10n's 5.9 ms. Although YOLOv12 is new and more accurate than YOLOv11, and YOLOv10, YOLO11n still stays the fastest YOLO model among YOLOv10, YOLOv11 and YOLOv12 series of models. (Index: YOLOv12, YOLOv11, YOLOv10, YOLOv13, YOLOv14, YOLOv15, YOLOE, YOLO Object detection)

124

12 Aug 2025

adversarial-robustness computer-science cryptography-and-security

VulScribeR: Exploring RAG-based Vulnerability Augmentation with LLMs

University of Manitoba Washington State University

VulScribeR, a framework developed by researchers from the University of Manitoba, Washington State University, and the University at Buffalo, leverages Large Language Models with Retrieval-Augmented Generation (RAG) to generate diverse vulnerable code samples for Deep Learning-based Vulnerability Detection (DLVD) models. Its Injection and Extension strategies significantly boost DLVD performance, showing an average 30.80% F1-score improvement over no augmentation and achieving up to a 53.84% F1-score gain at scale with a cost of US$1.88 per 1,000 samples.

17 Nov 2025

high-energy-astrophysical-phenomena general-relativity-and-quantum-cosmology high-energy-physics-theory

Tests of General Relativity with GWTC-3

University of Washington

CNRS

University of Toronto University of Mississippi University of Cincinnati

California Institute of Technology

University of Cambridge INFN Sezione di Napoli

Monash University National Central University National Astronomical Observatory of Japan Vanderbilt University

University of Notre Dame

Tel Aviv University

University College London Nikhef

Georgia Institute of Technology

University of Science and Technology of China

Tsinghua University

The Chinese University of Hong Kong University of Melbourne

The University of Texas at Austin University of Warsaw

Peking University

Texas A&M University

University of British Columbia

Northwestern University

NASA Goddard Space Flight Center Louisiana State University

University of Florida INFN Sezione di Pisa Rutherford Appleton Laboratory

University of Minnesota

University of Maryland

University of Tokyo Indian Institute of Science National Taiwan Normal University

The Pennsylvania State University Rochester Institute of Technology Gran Sasso Science Institute

Sorbonne Université University of Massachusetts Amherst

Australian National University University of Auckland Cardiff University University of Glasgow Leibniz Universität Hannover University of Portsmouth Universidade Federal do ABC High Energy Accelerator Research Organization (KEK)Indian Institute of Technology Madras University of Strathclyde Università di Genova University of Alabama in Huntsville Syracuse University University of Sannio RMIT University Instituto Nacional de Pesquisas Espaciais Università di Camerino Universitat de les Illes Balears Maastricht University University of Birmingham Università di Trieste National Cheng Kung University Aix Marseille University Kyushu University University of South Carolina Washington State University University of Oregon National Tsing-Hua University Kindai University The University of Western Australia Universidade de Aveiro Eötvös Loránd University Universitat Autònoma de Barcelona Sofia University Nicolaus Copernicus Astronomical Center Instituto de Fisica Teorica UAM/CSIC Shanghai Astronomical Observatory Nicolaus Copernicus University INFN, Laboratori Nazionali di Frascati University of Western Ontario Università di Napoli Federico II

University of California, Santa Cruz Embry-Riddle Aeronautical University University of Hawai’i University of Electro-Communications National Chung Hsing University Montana State University International Centre for Theoretical Sciences INFN Sezione di Perugia Istituto Nazionale di Alta Matematica The University of Sheffield Université de la Côte d’Azur Physikalisch-Technische Bundesanstalt Institut de Física d’Altes Energies (IFAE)INFN - Sezione di Padova University of the Balearic Islands Laboratoire Kastler Brossel Università di Firenze University of Toyama Istituto Nazionale di Ottica INFN-Sezione di Genova Universiteit Antwerpen The University of Mississippi University of Szeged Università di Perugia INFN-Sezione di Bologna Università di Cagliari VU Amsterdam Institute for Cosmic Ray Research, University of Tokyo INFN Sezione di Roma Tor Vergata Université de Paris, CNRS, Astroparticule et Cosmologie,California State University, Los Angeles Università di Siena LIGO Livingston Observatory National Center for High-Performance Computing NCBJ Laboratoire AstroParticule et Cosmologie - CNRS Università di Urbino Carlo Bo Università degli Studi di Sassari Università di Trento, INFN-TIFPA Wigner RCP, RMKI INFN Sezione di Cagliari RESCEU, University of Tokyo Univ Lyon, ENS de Lyon, CNRS, Université Claude Bernard Lyon 1 Universite de Nice, ARTEMIS, CNRS, Observatoire de la Cote d’Azur Istituto de Fısica Teórica, UAM/CSIC Albert-Einstein-Institut, Hanover APC, AstroParticule et Cosmologie, CNRS GSSI, INFN, Laboratori Nazionali del Gran Sasso National Institute of Technology, Akashi College LAPP, Universit´e Savoie Mont Blanc Università di Napoli Università degli Studi di Camerino The University of Sheffield, Department of Physics and Astronomy Universite de Paris * National and Kapodistrian University of Athens Friedrich-Schiller-Universität Jena Universit Grenoble Alpes Universit degli Studi di Genova Universit Libre de Bruxelles Universit di Trento Universit di Salerno Universit degli Studi di Padova Universit de Bordeaux Universit di Roma La Sapienza Universit Paris Cit Universit de Strasbourg Universit de Lyon Universit di Pisa INAF Osservatorio Astronomico di Padova Universit de Montpellier Universit di Roma Tor VergataUniversit Di Bologna INAF ` Osservatorio Astronomico di Trieste INFN Sezione di Firenze

Ish Gupta

The ever-increasing number of detections of gravitational waves (GWs) from compact binaries by the Advanced LIGO and Advanced Virgo detectors allows us to perform ever-more sensitive tests of general relativity (GR) in the dynamical and strong-field regime of gravity. We perform a suite of tests of GR using the compact binary signals observed during the second half of the third observing run of those detectors. We restrict our analysis to the 15 confident signals that have false alarm rates

\leq 10^{-3}\, {\rm yr}^{-1}

. In addition to signals consistent with binary black hole (BH) mergers, the new events include GW200115_042309, a signal consistent with a neutron star--BH merger. We find the residual power, after subtracting the best fit waveform from the data for each event, to be consistent with the detector noise. Additionally, we find all the post-Newtonian deformation coefficients to be consistent with the predictions from GR, with an improvement by a factor of ~2 in the -1PN parameter. We also find that the spin-induced quadrupole moments of the binary BH constituents are consistent with those of Kerr BHs in GR. We find no evidence for dispersion of GWs, non-GR modes of polarization, or post-merger echoes in the events that were analyzed. We update the bound on the mass of the graviton, at 90% credibility, to

m_g \leq 2.42 \times 10^{-23} \mathrm{eV}/c^2

. The final mass and final spin as inferred from the pre-merger and post-merger parts of the waveform are consistent with each other. The studies of the properties of the remnant BHs, including deviations of the quasi-normal mode frequencies and damping times, show consistency with the predictions of GR. In addition to considering signals individually, we also combine results from the catalog of GW signals to calculate more precise population constraints. We find no evidence in support of physics beyond GR.

07 Oct 2025

instrumentation-and-methods-for-astrophysics general-relativity-and-quantum-cosmology physics

Expectations for the first supermassive black-hole binary resolved by PTAs II: Milestones for binary characterization

Vanderbilt University

UC Berkeley Texas Tech University Washington State University Institute of Astrophysics, FORTH

Following the recent evidence for a gravitational wave (GW) background found by pulsar timing array (PTA) experiments, the next major science milestone is resolving individual supermassive black hole binaries (SMBHBs). The detection of these systems could arise via searches using a power-based GW anisotropy model or a deterministic template model. In Schult et al. 2025, we compared the efficacy of these models in constraining the GW signal from a single SMBHB using realistic, near-future PTA datasets, and found that the full-signal deterministic continuous wave (CW) search may achieve detection and characterization first. Here, we continue our analyses using only the CW model given its better performance, focusing now on characterization milestones. We examine the order in which CW parameters are constrained as PTA data are accumulated and the signal-to-noise ratio (S/N) grows. We also study how these parameter constraints vary across sources of different sky locations and GW frequencies. We find that the GW frequency and strain are generally constrained at the same time (or S/N), closely followed by the sky location, and later the chirp mass (if the source is highly evolving) and inclination angle. At fixed S/N, sources at higher frequencies generally achieve better precision on the GW frequency, chirp mass, and sky location. The time (and S/N) at which the signal becomes constrained is dependent on the sky location and frequency of the source, with the effects of pulsar terms and PTA geometry playing crucial roles in source detection and localization.

27 Feb 2024

computer-science cryptography-and-security

Chain-of-Thought Prompting of Large Language Models for Discovering and Fixing Software Vulnerabilities

The University of Texas at Dallas University at Buffalo Clemson University Washington State University

Security vulnerabilities are increasingly prevalent in modern software and they are widely consequential to our society. Various approaches to defending against these vulnerabilities have been proposed, among which those leveraging deep learning (DL) avoid major barriers with other techniques hence attracting more attention in recent years. However, DL-based approaches face critical challenges including the lack of sizable and quality-labeled task-specific datasets and their inability to generalize well to unseen, real-world scenarios. Lately, large language models (LLMs) have demonstrated impressive potential in various domains by overcoming those challenges, especially through chain-of-thought (CoT) prompting. In this paper, we explore how to leverage LLMs and CoT to address three key software vulnerability analysis tasks: identifying a given type of vulnerabilities, discovering vulnerabilities of any type, and patching detected vulnerabilities. We instantiate the general CoT methodology in the context of these tasks through VSP , our unified, vulnerability-semantics-guided prompting approach, and conduct extensive experiments assessing VSP versus five baselines for the three tasks against three LLMs and two datasets. Results show substantial superiority of our CoT-inspired prompting (553.3%, 36.5%, and 30.8% higher F1 accuracy for vulnerability identification, discovery, and patching, respectively, on CVE datasets) over the baselines. Through in-depth case studies analyzing VSP failures, we also reveal current gaps in LLM/CoT for challenging vulnerability cases, while proposing and validating respective improvements.

25 Sep 2025

computer-science machine-learning performance

PreLoRA: Hybrid Pre-training of Vision Transformers with Full Training and Low-Rank Adapters

Argonne National Laboratory Washington State University

Training large models ranging from millions to billions of parameters is highly resource-intensive, requiring significant time, compute, and memory. It is observed that most of the learning (higher change in weights) takes place in the earlier stage of the training loop. These changes stabilize as training continues, enabling them to be captured by matrices of a low intrinsic rank. Therefore, we propose an approach to identify such states of partial convergence and dynamically switch from full parameter training to Low-Rank Adaptation (LoRA) on the ViT-Large model. We introduce a flexible approach that leverages user-defined hyperparameters to determine the switching point and assign a rank specific to each module layer based on its level of convergence. Experimental results show that this approach preserves model accuracy while reducing the number of trainable parameters to 10% of its original size, resulting in a 3x improvement in throughput, and a 1.5x reduction in average training time per epoch while also reducing GPU memory consumption by 20%

517

13 Jun 2025

computer-science computer-vision-security computer-vision-and-pattern-recognition

YOLO advances to its genesis: a decadal and comprehensive review of the You Only Look Once (YOLO) series

City University of Hong Kong

University of Florida The University of Tennessee Washington State University University of Huddersfield The University of Central Florida ZenoRobotics, LLC Cooper Machine Company, Inc.Universidad de las Fuerzas Armadas Indian Institute of Science Education and Research Thiruvananthapuram (IISER TVM)

Ranjan Sapkota

This review systematically examines the progression of the You Only Look Once (YOLO) object detection algorithms from YOLOv1 to the recently unveiled YOLOv12. Employing a reverse chronological analysis, this study examines the advancements introduced by YOLO algorithms, beginning with YOLOv12 and progressing through YOLO11 (or YOLOv11), YOLOv10, YOLOv9, YOLOv8, and subsequent versions to explore each version's contributions to enhancing speed, detection accuracy, and computational efficiency in real-time object detection. Additionally, this study reviews the alternative versions derived from YOLO architectural advancements of YOLO-NAS, YOLO-X, YOLO-R, DAMO-YOLO, and Gold-YOLO. Moreover, the study highlights the transformative impact of YOLO models across five critical application areas: autonomous vehicles and traffic safety, healthcare and medical imaging, industrial manufacturing, surveillance and security, and agriculture. By detailing the incremental technological advancements in subsequent YOLO versions, this review chronicles the evolution of YOLO, and discusses the challenges and limitations in each of the earlier versions. The evolution signifies a path towards integrating YOLO with multimodal, context-aware, and Artificial General Intelligence (AGI) systems for the next YOLO decade, promising significant implications for future developments in AI-driven applications. YOLO Review, YOLO Advances, YOLOv13, YOLOv14, YOLOv15, YOLOv16, YOLOv17, YOLOv18, YOLOv19, YOLOv20, YOLO review, YOLO Object Detection

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Large-scale Robust Deep AUC Maximization: A New Surrogate Loss and Empirical Studies on Medical Image Classification

Improved Knowledge Distillation via Teacher Assistant

From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future

Bayesian Optimization over High-Dimensional Combinatorial Spaces via Dictionary-based Embeddings

ROOT: Rethinking Offline Optimization as Distributional Translation via Probabilistic Bridge

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Cold atomic ensembles as quantum antennas for distributed networks of single-atom arrays

APPATCH: Automated Adaptive Prompting Large Language Models for Real-World Software Vulnerability Patching

Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization

Exploring LLM-based Agents for Root Cause Analysis

SPH-Net: A Co-Attention Hybrid Model for Accurate Stock Price Prediction

Comprehensive Performance Evaluation of YOLOv12, YOLO11, YOLOv10, YOLOv9 and YOLOv8 on Detecting and Counting Fruitlet in Complex Orchard Environments

Conformal Prediction for Class-wise Coverage via Augmented Label Rank Calibration

Improved YOLOv12 with LLM-Generated Synthetic Data for Enhanced Apple Detection and Benchmarking Against YOLOv11 and YOLOv10

VulScribeR: Exploring RAG-based Vulnerability Augmentation with LLMs

Tests of General Relativity with GWTC-3

Expectations for the first supermassive black-hole binary resolved by PTAs II: Milestones for binary characterization

Chain-of-Thought Prompting of Large Language Models for Discovering and Fixing Software Vulnerabilities

PreLoRA: Hybrid Pre-training of Vision Transformers with Full Training and Low-Rank Adapters

YOLO advances to its genesis: a decadal and comprehensive review of the You Only Look Once (YOLO) series

Events

AI for Law

Personalize Your Feed