Washington State University
A robust Deep AUC Maximization (DAM) method employing a novel AUC min-max-margin loss was developed, securing 1st place in the CheXpert competition and a top 1% rank in the Kaggle 2020 Melanoma classification, demonstrating superior performance over conventional losses and prior AUC maximization techniques on large-scale, imbalanced medical image datasets.
312
Teacher Assistant Knowledge Distillation (TAKD) is proposed to improve knowledge transfer when a large teacher model distills knowledge to a much smaller student network. It introduces intermediate-sized Teacher Assistant networks, often yielding better student accuracy by bridging the capacity gap and leading to flatter loss landscapes.
2
A survey distinguishes between standalone Large Language Models (LLMs) and LLM-based agents in software engineering (SE), analyzing their applications across six domains including code generation and security. It details how agents extend LLMs with autonomy, tool integration, and complex decision-making, while also identifying prevalent models, benchmarks, and evaluation metrics used in the field.
The paper introduces Bayesian Optimization with Dictionaries (BODi), a method leveraging Dictionary-based Embeddings (HED) to enable standard Gaussian Process models for efficient optimization over high-dimensional combinatorial and mixed input spaces. This approach enhances the accuracy of surrogate models and improves sample efficiency by transforming discrete inputs into an ordinal feature space.
This paper studies the black-box optimization task which aims to find the maxima of a black-box function using a static set of its observed input-output pairs. This is often achieved via learning and optimizing a surrogate function with that offline data. Alternatively, it can also be framed as an inverse modeling task that maps a desired performance to potential input candidates that achieve it. Both approaches are constrained by the limited amount of offline data. To mitigate this limitation, we introduce a new perspective that casts offline optimization as a distributional translation task. This is formulated as learning a probabilistic bridge transforming an implicit distribution of low-value inputs (i.e., offline data) into another distribution of high-value inputs (i.e., solution candidates). Such probabilistic bridge can be learned using low- and high-value inputs sampled from synthetic functions that resemble the target function. These synthetic functions are constructed as the mean posterior of multiple Gaussian processes fitted with different parameterizations on the offline data, alleviating the data bottleneck. The proposed approach is evaluated on an extensive benchmark comprising most recent methods, demonstrating significant improvement and establishing a new state-of-the-art performance. Our code is publicly available at this https URL.
Researchers from UT Austin, Oxford, Leeds, Tufts, and Alberta established a unified conceptual framework for curriculum learning in reinforcement learning (RL), systematically classifying existing methods and identifying key research gaps. The work defines curriculum learning through task generation, sequencing, and transfer learning components, providing a structured overview to address challenges in RL sample efficiency and complex task acquisition.
Single neutral atoms in optical tweezer arrays offer a promising platform for high-fidelity quantum computing at local nodes. Nonetheless, creating entanglement between remote nodes in a distributed quantum network remains challenging due to inherently weak atom-light coupling. Here, we design a distributed quantum network architecture in which cold atomic ensembles with strong atom-light interactions act as quantum antennas, interfacing single-atom qubits with flying photons to enable high-efficiency atom-photon entanglement generation -- analogous to the role of antennas in classical communication. Using realistic experimental parameters, we estimate an efficiency of η0.548\eta \simeq 0.548 for generating atom-photon entanglement, a probability of PE6%P_{E} \simeq 6 \% for generating atom-atom entanglement, and a remote entanglement generation rate of 16.616.6 kHz. This performance not only surpasses that of state-of-the-art cavity-based or high-numerical-aperture-lens-based architectures but also offers notable advantages in simplicity, tunability, and experimental accessibility. Our scheme also integrates a long-lived quantum memory, providing a storage advantage for quantum repeater design. By leveraging the complementary strengths of single-atom qubits for local operations and cold atomic ensembles for networking, this approach paves the way for scalable distributed quantum computing and sensing.
Timely and effective vulnerability patching is essential for cybersecurity defense, for which various approaches have been proposed yet still struggle to generate valid and correct patches for real-world vulnerabilities. In this paper, we leverage the power and merits of pre-trained language language models (LLMs) to enable automated vulnerability patching using no test input/exploit evidence and without model training/fine-tuning. To elicit LLMs to effectively reason about vulnerable code behaviors, which is essential for quality patch generation, we introduce vulnerability semantics reasoning and adaptive prompting on LLMs and instantiate the methodology as APPATCH, an automated LLM-based patching system. Our evaluation of APPATCH on 97 zero-day vulnerabilities and 20 existing vulnerabilities demonstrates its superior performance to both existing prompting methods and state-of-the-art non-LLM-based techniques (by up to 28.33% in F1 and 182.26% in recall over the best baseline). Through APPATCH, we demonstrate what helps for LLM-based patching and how, as well as discussing what still lacks and why.
Balancing helpfulness and safety (harmlessness) is a critical challenge in aligning large language models (LLMs). Current approaches often decouple these two objectives, training separate preference models for helpfulness and safety, while framing safety as a constraint within a constrained Markov Decision Process (CMDP) framework. This paper identifies a potential issue when using the widely adopted expected safety constraints for LLM safety alignment, termed "safety compensation", where the constraints are satisfied on expectation, but individual prompts may trade off safety, resulting in some responses being overly restrictive while others remain unsafe. To address this issue, we propose Rectified Policy Optimization (RePO), which replaces the expected safety constraint with critical safety constraints imposed on every prompt. At the core of RePO is a policy update mechanism driven by rectified policy gradients, which penalizes the strict safety violation of every prompt, thereby enhancing safety across nearly all prompts. Our experiments demonstrate that RePO outperforms strong baseline methods and significantly enhances LLM safety alignment.
Researchers from Microsoft and Washington State University developed an LLM-based agent, leveraging the ReAct framework, to automate Root Cause Analysis in cloud incident management. This agent achieved competitive correctness rates for RCA while demonstrating a significantly lower hallucination rate of 7.27% compared to baseline LLM approaches.
Prediction of stock price movements presents a formidable challenge in financial analytics due to the inherent volatility, non-stationarity, and nonlinear characteristics of market data. This paper introduces SPH-Net (Stock Price Prediction Hybrid Neural Network), an innovative deep learning framework designed to enhance the accuracy of time series forecasting in financial markets. The proposed architecture employs a novel co-attention mechanism that initially processes temporal patterns through a Vision Transformer, followed by refined feature extraction via an attention mechanism, thereby capturing both global and local dependencies in market data. To rigorously evaluate the model's performance, we conduct comprehensive experiments on eight diverse stock datasets: AMD, Ebay, Facebook, FirstService Corp, Tesla, Google, Mondi ADR, and Matador Resources. Each dataset is standardized using six fundamental market indicators: Open, High, Low, Close, Adjusted Close, and Volume, representing a complete set of features for comprehensive market analysis. Experimental results demonstrate that SPH-Net consistently outperforms existing stock prediction models across all evaluation metrics. The model's superior performance stems from its ability to effectively capture complex temporal patterns while maintaining robustness against market noise. By significantly improving prediction accuracy in financial time series analysis, SPH-Net provides valuable decision-support capabilities for investors and financial analysts, potentially enabling more informed investment strategies and risk assessment in volatile market conditions.
This study systematically performed an extensive real-world evaluation of the performances of all configurations of YOLOv8, YOLOv9, YOLOv10, YOLO11( or YOLOv11), and YOLOv12 object detection algorithms in terms of precision, recall, mean Average Precision at 50\% Intersection over Union (mAP@50), and computational speeds including pre-processing, inference, and post-processing times immature green apple (or fruitlet) detection in commercial orchards. Additionally, this research performed and validated in-field counting of the fruitlets using an iPhone and machine vision sensors. Among the configurations, YOLOv12l recorded the highest recall rate at 0.90, compared to all other configurations of YOLO models. Likewise, YOLOv10x achieved the highest precision score of 0.908, while YOLOv9 Gelan-c attained a precision of 0.903. Analysis of mAP@0.50 revealed that YOLOv9 Gelan-base and YOLOv9 Gelan-e reached peak scores of 0.935, with YOLO11s and YOLOv12l following closely at 0.933 and 0.931, respectively. For counting validation using images captured with an iPhone 14 Pro, the YOLO11n configuration demonstrated outstanding accuracy, recording RMSE values of 4.51 for Honeycrisp, 4.59 for Cosmic Crisp, 4.83 for Scilate, and 4.96 for Scifresh; corresponding MAE values were 4.07, 3.98, 7.73, and 3.85. Similar performance trends were observed with RGB-D sensor data. Moreover, sensor-specific training on Intel Realsense data significantly enhanced model performance. YOLOv11n achieved highest inference speed of 2.4 ms, outperforming YOLOv8n (4.1 ms), YOLOv9 Gelan-s (11.5 ms), YOLOv10n (5.5 ms), and YOLOv12n (4.6 ms), underscoring its suitability for real-time object detection applications. (YOLOv12 architecture, YOLOv11 Architecture, YOLOv12 object detection, YOLOv11 object detecion, YOLOv12 segmentation)
Conformal prediction (CP) is an emerging uncertainty quantification framework that allows us to construct a prediction set to cover the true label with a pre-specified marginal or conditional probability. Although the valid coverage guarantee has been extensively studied for classification problems, CP often produces large prediction sets which may not be practically useful. This issue is exacerbated for the setting of class-conditional coverage on imbalanced classification tasks with many and/or imbalanced classes. This paper proposes the Rank Calibrated Class-conditional CP (RC3P) algorithm to reduce the prediction set sizes to achieve class-conditional coverage, where the valid coverage holds for each class. In contrast to the standard class-conditional CP (CCP) method that uniformly thresholds the class-wise conformity score for each class, the augmented label rank calibration step allows RC3P to selectively iterate this class-wise thresholding subroutine only for a subset of classes whose class-wise top-k error is small. We prove that agnostic to the classifier and data distribution, RC3P achieves class-wise coverage. We also show that RC3P reduces the size of prediction sets compared to the CCP method. Comprehensive experiments on multiple real-world datasets demonstrate that RC3P achieves class-wise coverage and 26.25% reduction in prediction set sizes on average.
This study evaluated the performance of the YOLOv12 object detection model, and compared against the performances YOLOv11 and YOLOv10 for apple detection in commercial orchards based on the model training completed entirely on synthetic images generated by Large Language Models (LLMs). The YOLOv12n configuration achieved the highest precision at 0.916, the highest recall at 0.969, and the highest mean Average Precision (mAP@50) at 0.978. In comparison, the YOLOv11 series was led by YOLO11x, which achieved the highest precision at 0.857, recall at 0.85, and mAP@50 at 0.91. For the YOLOv10 series, YOLOv10b and YOLOv10l both achieved the highest precision at 0.85, with YOLOv10n achieving the highest recall at 0.8 and mAP@50 at 0.89. These findings demonstrated that YOLOv12, when trained on realistic LLM-generated datasets surpassed its predecessors in key performance metrics. The technique also offered a cost-effective solution by reducing the need for extensive manual data collection in the agricultural field. In addition, this study compared the computational efficiency of all versions of YOLOv12, v11 and v10, where YOLOv11n reported the lowest inference time at 4.7 ms, compared to YOLOv12n's 5.6 ms and YOLOv10n's 5.9 ms. Although YOLOv12 is new and more accurate than YOLOv11, and YOLOv10, YOLO11n still stays the fastest YOLO model among YOLOv10, YOLOv11 and YOLOv12 series of models. (Index: YOLOv12, YOLOv11, YOLOv10, YOLOv13, YOLOv14, YOLOv15, YOLOE, YOLO Object detection)
VulScribeR, a framework developed by researchers from the University of Manitoba, Washington State University, and the University at Buffalo, leverages Large Language Models with Retrieval-Augmented Generation (RAG) to generate diverse vulnerable code samples for Deep Learning-based Vulnerability Detection (DLVD) models. Its Injection and Extension strategies significantly boost DLVD performance, showing an average 30.80% F1-score improvement over no augmentation and achieving up to a 53.84% F1-score gain at scale with a cost of US$1.88 per 1,000 samples.
University of Washington logoUniversity of WashingtonCNRS logoCNRSUniversity of Toronto logoUniversity of TorontoUniversity of MississippiUniversity of CincinnatiCalifornia Institute of Technology logoCalifornia Institute of TechnologyUniversity of Cambridge logoUniversity of CambridgeINFN Sezione di NapoliMonash University logoMonash UniversityNational Central UniversityNational Astronomical Observatory of JapanVanderbilt UniversityUniversity of Notre Dame logoUniversity of Notre DameTel Aviv University logoTel Aviv UniversityUniversity College London logoUniversity College LondonNikhefGeorgia Institute of Technology logoGeorgia Institute of TechnologyUniversity of Science and Technology of China logoUniversity of Science and Technology of ChinaTsinghua University logoTsinghua UniversityThe Chinese University of Hong Kong logoThe Chinese University of Hong KongUniversity of MelbourneThe University of Texas at Austin logoThe University of Texas at AustinUniversity of WarsawPeking University logoPeking UniversityTexas A&M University logoTexas A&M UniversityUniversity of British Columbia logoUniversity of British ColumbiaNorthwestern University logoNorthwestern UniversityNASA Goddard Space Flight Center logoNASA Goddard Space Flight CenterLouisiana State UniversityUniversity of Florida logoUniversity of FloridaINFN Sezione di PisaRutherford Appleton LaboratoryUniversity of Minnesota logoUniversity of MinnesotaUniversity of Maryland logoUniversity of MarylandUniversity of Tokyo logoUniversity of TokyoIndian Institute of ScienceNational Taiwan Normal UniversityThe Pennsylvania State University logoThe Pennsylvania State UniversityRochester Institute of TechnologyGran Sasso Science InstituteSorbonne Université logoSorbonne UniversitéUniversity of Massachusetts AmherstAustralian National University logoAustralian National UniversityUniversity of AucklandCardiff UniversityUniversity of GlasgowLeibniz Universität HannoverUniversity of PortsmouthUniversidade Federal do ABCHigh Energy Accelerator Research Organization (KEK)Indian Institute of Technology MadrasUniversity of StrathclydeUniversità di GenovaUniversity of Alabama in HuntsvilleSyracuse UniversityUniversity of SannioRMIT UniversityInstituto Nacional de Pesquisas EspaciaisUniversità di CamerinoUniversitat de les Illes BalearsMaastricht UniversityUniversity of BirminghamUniversità di TriesteNational Cheng Kung UniversityAix Marseille UniversityKyushu UniversityUniversity of South CarolinaWashington State UniversityUniversity of OregonNational Tsing-Hua UniversityKindai UniversityThe University of Western AustraliaUniversidade de AveiroEötvös Loránd UniversityUniversitat Autònoma de BarcelonaSofia UniversityNicolaus Copernicus Astronomical CenterInstituto de Fisica Teorica UAM/CSICShanghai Astronomical ObservatoryNicolaus Copernicus UniversityINFN, Laboratori Nazionali di FrascatiUniversity of Western OntarioUniversità di Napoli Federico IIUniversity of California, Santa Cruz logoUniversity of California, Santa CruzEmbry-Riddle Aeronautical UniversityUniversity of Hawai’iUniversity of Electro-CommunicationsNational Chung Hsing UniversityMontana State UniversityInternational Centre for Theoretical SciencesINFN Sezione di PerugiaIstituto Nazionale di Alta MatematicaThe University of SheffieldUniversité de la Côte d’AzurPhysikalisch-Technische BundesanstaltInstitut de Física d’Altes Energies (IFAE)INFN - Sezione di PadovaUniversity of the Balearic IslandsLaboratoire Kastler BrosselUniversità di FirenzeUniversity of ToyamaIstituto Nazionale di OtticaINFN-Sezione di GenovaUniversiteit AntwerpenThe University of MississippiUniversity of SzegedUniversità di PerugiaINFN-Sezione di BolognaUniversità di CagliariVU AmsterdamInstitute for Cosmic Ray Research, University of TokyoINFN Sezione di Roma Tor VergataUniversité de Paris, CNRS, Astroparticule et Cosmologie,California State University, Los AngelesUniversità di SienaLIGO Livingston ObservatoryNational Center for High-Performance ComputingNCBJLaboratoire AstroParticule et Cosmologie - CNRSUniversità di Urbino Carlo BoUniversità degli Studi di SassariUniversità di Trento, INFN-TIFPAWigner RCP, RMKIINFN Sezione di CagliariRESCEU, University of TokyoUniv Lyon, ENS de Lyon, CNRS, Université Claude Bernard Lyon 1Universite de Nice, ARTEMIS, CNRS, Observatoire de la Cote d’AzurIstituto de Fısica Teórica, UAM/CSICAlbert-Einstein-Institut, HanoverAPC, AstroParticule et Cosmologie, CNRSGSSI, INFN, Laboratori Nazionali del Gran SassoNational Institute of Technology, Akashi CollegeLAPP, Universit´e Savoie Mont BlancUniversità di NapoliUniversità degli Studi di CamerinoThe University of Sheffield, Department of Physics and AstronomyUniversite de Paris* National and Kapodistrian University of AthensFriedrich-Schiller-Universität JenaUniversit Grenoble AlpesUniversit degli Studi di GenovaUniversit Libre de BruxellesUniversit di TrentoUniversit di SalernoUniversit degli Studi di PadovaUniversit de BordeauxUniversit di Roma La SapienzaUniversit Paris CitUniversit de StrasbourgUniversit de LyonUniversit di PisaINAF Osservatorio Astronomico di PadovaUniversit de MontpellierUniversit di Roma Tor VergataUniversit Di BolognaINAF ` Osservatorio Astronomico di TriesteINFN Sezione di Firenze
The ever-increasing number of detections of gravitational waves (GWs) from compact binaries by the Advanced LIGO and Advanced Virgo detectors allows us to perform ever-more sensitive tests of general relativity (GR) in the dynamical and strong-field regime of gravity. We perform a suite of tests of GR using the compact binary signals observed during the second half of the third observing run of those detectors. We restrict our analysis to the 15 confident signals that have false alarm rates 103yr1\leq 10^{-3}\, {\rm yr}^{-1}. In addition to signals consistent with binary black hole (BH) mergers, the new events include GW200115_042309, a signal consistent with a neutron star--BH merger. We find the residual power, after subtracting the best fit waveform from the data for each event, to be consistent with the detector noise. Additionally, we find all the post-Newtonian deformation coefficients to be consistent with the predictions from GR, with an improvement by a factor of ~2 in the -1PN parameter. We also find that the spin-induced quadrupole moments of the binary BH constituents are consistent with those of Kerr BHs in GR. We find no evidence for dispersion of GWs, non-GR modes of polarization, or post-merger echoes in the events that were analyzed. We update the bound on the mass of the graviton, at 90% credibility, to mg2.42×1023eV/c2m_g \leq 2.42 \times 10^{-23} \mathrm{eV}/c^2. The final mass and final spin as inferred from the pre-merger and post-merger parts of the waveform are consistent with each other. The studies of the properties of the remnant BHs, including deviations of the quasi-normal mode frequencies and damping times, show consistency with the predictions of GR. In addition to considering signals individually, we also combine results from the catalog of GW signals to calculate more precise population constraints. We find no evidence in support of physics beyond GR.
Following the recent evidence for a gravitational wave (GW) background found by pulsar timing array (PTA) experiments, the next major science milestone is resolving individual supermassive black hole binaries (SMBHBs). The detection of these systems could arise via searches using a power-based GW anisotropy model or a deterministic template model. In Schult et al. 2025, we compared the efficacy of these models in constraining the GW signal from a single SMBHB using realistic, near-future PTA datasets, and found that the full-signal deterministic continuous wave (CW) search may achieve detection and characterization first. Here, we continue our analyses using only the CW model given its better performance, focusing now on characterization milestones. We examine the order in which CW parameters are constrained as PTA data are accumulated and the signal-to-noise ratio (S/N) grows. We also study how these parameter constraints vary across sources of different sky locations and GW frequencies. We find that the GW frequency and strain are generally constrained at the same time (or S/N), closely followed by the sky location, and later the chirp mass (if the source is highly evolving) and inclination angle. At fixed S/N, sources at higher frequencies generally achieve better precision on the GW frequency, chirp mass, and sky location. The time (and S/N) at which the signal becomes constrained is dependent on the sky location and frequency of the source, with the effects of pulsar terms and PTA geometry playing crucial roles in source detection and localization.
Security vulnerabilities are increasingly prevalent in modern software and they are widely consequential to our society. Various approaches to defending against these vulnerabilities have been proposed, among which those leveraging deep learning (DL) avoid major barriers with other techniques hence attracting more attention in recent years. However, DL-based approaches face critical challenges including the lack of sizable and quality-labeled task-specific datasets and their inability to generalize well to unseen, real-world scenarios. Lately, large language models (LLMs) have demonstrated impressive potential in various domains by overcoming those challenges, especially through chain-of-thought (CoT) prompting. In this paper, we explore how to leverage LLMs and CoT to address three key software vulnerability analysis tasks: identifying a given type of vulnerabilities, discovering vulnerabilities of any type, and patching detected vulnerabilities. We instantiate the general CoT methodology in the context of these tasks through VSP , our unified, vulnerability-semantics-guided prompting approach, and conduct extensive experiments assessing VSP versus five baselines for the three tasks against three LLMs and two datasets. Results show substantial superiority of our CoT-inspired prompting (553.3%, 36.5%, and 30.8% higher F1 accuracy for vulnerability identification, discovery, and patching, respectively, on CVE datasets) over the baselines. Through in-depth case studies analyzing VSP failures, we also reveal current gaps in LLM/CoT for challenging vulnerability cases, while proposing and validating respective improvements.
Training large models ranging from millions to billions of parameters is highly resource-intensive, requiring significant time, compute, and memory. It is observed that most of the learning (higher change in weights) takes place in the earlier stage of the training loop. These changes stabilize as training continues, enabling them to be captured by matrices of a low intrinsic rank. Therefore, we propose an approach to identify such states of partial convergence and dynamically switch from full parameter training to Low-Rank Adaptation (LoRA) on the ViT-Large model. We introduce a flexible approach that leverages user-defined hyperparameters to determine the switching point and assign a rank specific to each module layer based on its level of convergence. Experimental results show that this approach preserves model accuracy while reducing the number of trainable parameters to 10% of its original size, resulting in a 3x improvement in throughput, and a 1.5x reduction in average training time per epoch while also reducing GPU memory consumption by 20%
This review systematically examines the progression of the You Only Look Once (YOLO) object detection algorithms from YOLOv1 to the recently unveiled YOLOv12. Employing a reverse chronological analysis, this study examines the advancements introduced by YOLO algorithms, beginning with YOLOv12 and progressing through YOLO11 (or YOLOv11), YOLOv10, YOLOv9, YOLOv8, and subsequent versions to explore each version's contributions to enhancing speed, detection accuracy, and computational efficiency in real-time object detection. Additionally, this study reviews the alternative versions derived from YOLO architectural advancements of YOLO-NAS, YOLO-X, YOLO-R, DAMO-YOLO, and Gold-YOLO. Moreover, the study highlights the transformative impact of YOLO models across five critical application areas: autonomous vehicles and traffic safety, healthcare and medical imaging, industrial manufacturing, surveillance and security, and agriculture. By detailing the incremental technological advancements in subsequent YOLO versions, this review chronicles the evolution of YOLO, and discusses the challenges and limitations in each of the earlier versions. The evolution signifies a path towards integrating YOLO with multimodal, context-aware, and Artificial General Intelligence (AGI) systems for the next YOLO decade, promising significant implications for future developments in AI-driven applications. YOLO Review, YOLO Advances, YOLOv13, YOLOv14, YOLOv15, YOLOv16, YOLOv17, YOLOv18, YOLOv19, YOLOv20, YOLO review, YOLO Object Detection
There are no more papers matching your filters at the moment.