alphaXiv

History

Papers Benchmarks

Augusta University

2,039

20 Apr 2025

computer-science computation-and-language machine-learning

Knowledge Distillation and Dataset Distillation of Large Language Models: Emerging Trends, Challenges, and Future Directions

Harvard University Vanderbilt University

Carnegie Mellon University

University of Georgia Augusta University The University of Texas at Arlington Mayo Clinic Arizona

A comprehensive survey examines Knowledge Distillation (KD) and Dataset Distillation (DD) techniques for Large Language Models, analyzing methodologies across multiple domains while revealing key challenges in preserving emergent abilities and handling architectural heterogeneity during the compression of large-scale language models.

131

25 Jul 2025

adversarial-robustness agents computer-science

Alignment and Safety in Large Language Models: Safety Mechanisms, Training Paradigms, and Emerging Challenges

Harvard University

Carnegie Mellon University

Stanford University

Northwestern University

University of Georgia

Boston University

University of Arizona

Virginia Tech New Jersey Institute of Technology Augusta University University of Texas at Arlington Indiana University Indianapolis Mayo Clinic Arizona

This extensive survey provides a structured overview of alignment and safety in Large Language Models (LLMs), analyzing training paradigms, safety mechanisms, and emerging challenges. It synthesizes current research, identifies industry practices, and outlines open problems to ensure LLMs align with human values and intentions.

707

02 Aug 2024

computer-science artificial-intelligence multi-modal-learning

A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks

Northwestern Polytechnical University Shaanxi Normal University University of Electronic Science and Technology of China ShanghaiTech University Augusta University The University of Georgia Shanghai United Imaging Intelligence Co., Ltd.

This report systematically reviews the current landscape of Multimodal Large Language Models (MLLMs), detailing their core architectures, diverse applications across various modalities, and inherent limitations. It highlights the rapid evolution of the field and its potential for more natural human-AI interaction by enabling AI systems to process and understand text, images, audio, and video simultaneously.

11 Oct 2025

ai-for-health computer-science computer-vision-and-pattern-recognition

MIMO: A medical vision language model with visual referring multimodal input and pixel grounding multimodal output

Peking University Augusta University

A medical vision language model named MIMO was developed, alongside the large-scale multimodal MIMOSeg dataset, to enable visual referring multimodal input and pixel grounding multimodal output for medical image analysis. MIMO achieved superior performance on medical segmentation tasks, with mIoU up to 0.665 on visual prompt-perceiving tasks, and led in medical VQA benchmarks, reaching 58.8% accuracy on VQA-RAD.

23 Aug 2025

computer-science computational-engineering-finance-and-science

From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models

Wuhan University

University of Manchester Southwest Jiaotong University Augusta University Nanjing Audit University Beijing University of Financial Technology

Large Language Models (LLMs) have shown promise for financial applications, yet their suitability for this high-stakes domain remains largely unproven due to inadequacies in existing benchmarks. Existing benchmarks solely rely on score-level evaluation, summarizing performance with a single score that obscures the nuanced understanding of what models truly know and their precise limitations. They also rely on datasets that cover only a narrow subset of financial concepts, while overlooking other essentials for real-world applications. To address these gaps, we introduce FinCDM, the first cognitive diagnosis evaluation framework tailored for financial LLMs, enabling the evaluation of LLMs at the knowledge-skill level, identifying what financial skills and knowledge they have or lack based on their response patterns across skill-tagged tasks, rather than a single aggregated number. We construct CPA-KQA, the first cognitively informed financial evaluation dataset derived from the Certified Public Accountant (CPA) examination, with comprehensive coverage of real-world accounting and financial skills. It is rigorously annotated by domain experts, who author, validate, and annotate questions with high inter-annotator agreement and fine-grained knowledge labels. Our extensive experiments on 30 proprietary, open-source, and domain-specific LLMs show that FinCDM reveals hidden knowledge gaps, identifies under-tested areas such as tax and regulatory reasoning overlooked by traditional benchmarks, and uncovers behavioral clusters among models. FinCDM introduces a new paradigm for financial LLM evaluation by enabling interpretable, skill-aware diagnosis that supports more trustworthy and targeted model development, and all datasets and evaluation scripts will be publicly released to support further research.

273

08 May 2025

computer-science artificial-intelligence machine-learning

T2S: High-resolution Time Series Generation with Text-to-Series Diffusion Models

Carnegie Mellon University

Zhejiang University Xidian University The Hong Kong University of Science and Technology (Guangzhou)Griffith University Yunnan University Augusta University

Text-to-Time Series generation holds significant potential to address challenges such as data sparsity, imbalance, and limited availability of multimodal time series datasets across domains. While diffusion models have achieved remarkable success in Text-to-X (e.g., vision and audio data) generation, their use in time series generation remains in its nascent stages. Existing approaches face two critical limitations: (1) the lack of systematic exploration of general-proposed time series captions, which are often domain-specific and struggle with generalization; and (2) the inability to generate time series of arbitrary lengths, limiting their applicability to real-world scenarios. In this work, we first categorize time series captions into three levels: point-level, fragment-level, and instance-level. Additionally, we introduce a new fragment-level dataset containing over 600,000 high-resolution time series-text pairs. Second, we propose Text-to-Series (T2S), a diffusion-based framework that bridges the gap between natural language and time series in a domain-agnostic manner. T2S employs a length-adaptive variational autoencoder to encode time series of varying lengths into consistent latent embeddings. On top of that, T2S effectively aligns textual representations with latent embeddings by utilizing Flow Matching and employing Diffusion Transformer as the denoiser. We train T2S in an interleaved paradigm across multiple lengths, allowing it to generate sequences of any desired length. Extensive evaluations demonstrate that T2S achieves state-of-the-art performance across 13 datasets spanning 12 domains.

148

11 Oct 2025

computer-science computation-and-language data-curation

MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial Application

Real-world financial analysis involves information across multiple languages and modalities, from reports and news to scanned filings and meeting recordings. Yet most existing evaluations of LLMs in finance remain text-only, monolingual, and largely saturated by current models. To bridge these gaps, we present MultiFinBen, the first expert-annotated multilingual (five languages) and multimodal (text, vision, audio) benchmark for evaluating LLMs in realistic financial contexts. MultiFinBen introduces two new task families: multilingual financial reasoning, which tests cross-lingual evidence integration from filings and news, and financial OCR, which extracts structured text from scanned documents containing tables and charts. Rather than aggregating all available datasets, we apply a structured, difficulty-aware selection based on advanced model performance, ensuring balanced challenge and removing redundant tasks. Evaluating 21 leading LLMs shows that even frontier multimodal models like GPT-4o achieve only 46.01% overall, stronger on vision and audio but dropping sharply in multilingual settings. These findings expose persistent limitations in multilingual, multimodal, and expert-level financial reasoning. All datasets, evaluation scripts, and leaderboards are publicly released.

02 Jun 2024

computer-science cryptography-and-security machine-learning

Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning

Rensselaer Polytechnic Institute

Columbia University

Rice University University of Rochester

University of Virginia Augusta University

The surge in interest and application of large language models (LLMs) has sparked a drive to fine-tune these models to suit specific applications, such as finance and medical science. However, concerns regarding data privacy have emerged, especially when multiple stakeholders aim to collaboratively enhance LLMs using sensitive data. In this scenario, federated learning becomes a natural choice, allowing decentralized fine-tuning without exposing raw data to central servers. Motivated by this, we investigate how data privacy can be ensured in LLM fine-tuning through practical federated learning approaches, enabling secure contributions from multiple parties to enhance LLMs. Yet, challenges arise: 1) despite avoiding raw data exposure, there is a risk of inferring sensitive information from model outputs, and 2) federated learning for LLMs incurs notable communication overhead. To address these challenges, this article introduces DP-LoRA, a novel federated learning algorithm tailored for LLMs. DP-LoRA preserves data privacy by employing a Gaussian mechanism that adds noise in weight updates, maintaining individual data privacy while facilitating collaborative model training. Moreover, DP-LoRA optimizes communication efficiency via low-rank adaptation, minimizing the transmission of updated weights during distributed training. The experimental results across medical, financial, and general datasets using various LLMs demonstrate that DP-LoRA effectively ensures strict privacy constraints while minimizing communication overhead.

31 Oct 2024

computer-science artificial-intelligence neurons-and-cognition

Brain-like Functional Organization within Large Language Models

Northwestern Polytechnical University

University of Georgia Augusta University

Researchers developed a framework to directly couple sub-groups of artificial neurons within Large Language Models (LLMs) to established human functional brain networks. The study found that more advanced LLMs exhibit a progressively more compact and specialized functional organization, balancing computational diversity with consistent functional specializations.

08 Oct 2025

computer-science programming-languages

Type, Ability, and Effect Systems: Perspectives on Purity, Semantics, and Expressiveness

Purdue University Augusta University

Programming benefits from a clear separation between pure, mathematical computation and impure, effectful interaction with the world. Existing approaches to enforce this separation include monads, type-and-effect systems, and capability systems. All share a tension between precision and usability, and each one has non-obvious strengths and weaknesses. This paper aims to raise the bar in assessing such systems. First, we propose a semantic definition of purity, inspired by contextual equivalence, as a baseline independent of any specific typing discipline. Second, we propose that expressiveness should be measured by the degree of completeness, i.e., how many semantically pure terms can be typed as pure. Using this measure, we focus on minimal meaningful effect and capability systems and show that they are incomparable, i.e., neither subsumes the other in terms of expressiveness. Based on this result, we propose a synthesis and show that type, ability, and effect systems combine their respective strengths while avoiding their weaknesses. As part of our formal model, we provide a logical relation to facilitate proofs of purity and other properties for a variety of effect typing disciplines.

307

10 Jan 2025

ai-for-genomics ai-for-health computer-science

Large Language Models for Bioinformatics

Carnegie Mellon University

University of Georgia Lehigh University Harvard Medical School Massachusetts General Hospital University of Colorado Mayo Clinic Augusta University UNC-Chapel Hill University of Texas at Arlington Indiana University Indianapolis University of Tennessee at Chattanooga

This paper presents a comprehensive overview of Large Language Models (LLMs) and their applications in bioinformatics research and healthcare

760

279

18 Nov 2025

computer-science computation-and-language explainable-ai

Evaluation of OpenAI o1: Opportunities and Challenges of AGI

Researchers extensively evaluated OpenAI's o1-preview across 27 diverse tasks, from mathematics to medical diagnosis, using a new AGI-Benchmark 1.0, finding its advanced reasoning capabilities often meet or exceed human performance but also identifying areas for improvement in abstract logical proofs and efficiency.

08 May 2025

computer-science cryptography-and-security

Memory Under Siege: A Comprehensive Survey of Side-Channel Attacks on Memory

University of Houston Augusta University

Side-channel attacks on memory (SCAM) exploit unintended data leaks from memory subsystems to infer sensitive information, posing significant threats to system security. These attacks exploit vulnerabilities in memory access patterns, cache behaviors, and other microarchitectural features to bypass traditional security measures. The purpose of this research is to examine SCAM, classify various attack techniques, and evaluate existing defense mechanisms. It guides researchers and industry professionals in improving memory security and mitigating emerging threats. We begin by identifying the major vulnerabilities in the memory system that are frequently exploited in SCAM, such as cache timing, speculative execution, \textit{Rowhammer}, and other sophisticated approaches. Next, we outline a comprehensive taxonomy that systematically classifies these attacks based on their types, target systems, attack vectors, and adversarial capabilities required to execute them. In addition, we review the current landscape of mitigation strategies, emphasizing their strengths and limitations. This work aims to provide a comprehensive overview of memory-based side-channel attacks with the goal of providing significant insights for researchers and practitioners to better understand, detect, and mitigate SCAM risks.

21 Oct 2025

agents computer-science machine-learning

SolverLLM: Leveraging Test-Time Scaling for Optimization Problem via LLM-Guided Search

Baylor University University of Texas at Dallas Augusta University Southern Illinois University NEC Labs America

SolverLLM introduces a training-free framework that leverages an LLM-guided Monte Carlo Tree Search to automate the formulation and solution of optimization problems. This method achieved over 10% higher solving accuracy compared to prompt-based LLM baselines and matched or surpassed learning-based methods on challenging benchmarks without requiring any task-specific training.

166

11 Dec 2024

ai-for-health computer-science artificial-intelligence

Predicting Human Brain States with Transformer

Macquarie University

University of Sydney Augusta University

This paper presents a method for predicting future brain activity patterns from functional magnetic resonance imaging (fMRI) data by adapting the Transformer architecture. The model accurately predicts subsequent brain states from short input sequences while preserving the underlying functional organization of the brain, offering a foundation for reducing fMRI scan times.

13 Sep 2024

adversarial-attacks computer-science computer-vision-security

A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks

Harbin Institute of Technology Pengcheng Laboratory Augusta University Tsinghua Shenzhen International Graduate School, Tsinghua University

Model Inversion (MI) attacks aim to reconstruct privacy-sensitive training data from released models by utilizing output information, raising extensive concerns about the security of Deep Neural Networks (DNNs). Recent advances in generative adversarial networks (GANs) have contributed significantly to the improved performance of MI attacks due to their powerful ability to generate realistic images with high fidelity and appropriate semantics. However, previous MI attacks have solely disclosed private information in the latent space of GAN priors, limiting their semantic extraction and transferability across multiple target models and datasets. To address this challenge, we propose a novel method, Intermediate Features enhanced Generative Model Inversion (IF-GMI), which disassembles the GAN structure and exploits features between intermediate blocks. This allows us to extend the optimization space from latent code to intermediate features with enhanced expressive capabilities. To prevent GAN priors from generating unrealistic images, we apply a L1 ball constraint to the optimization process. Experiments on multiple benchmarks demonstrate that our method significantly outperforms previous approaches and achieves state-of-the-art results under various settings, especially in the out-of-distribution (OOD) scenario. Our code is available at: this https URL

186

25 Mar 2025

computer-science artificial-intelligence deep-reinforcement-learning

Mathematics and Machine Creativity: A Survey on Bridging Mathematics with AI

University of Georgia

University of Alberta Augusta University

This paper presents a comprehensive overview on the applications of artificial intelligence (AI) in mathematical research, highlighting the transformative role AI has begun to play in this domain. Traditionally, AI advancements have heavily relied on theoretical foundations provided by mathematics and statistics. However, recent developments in AI, particularly in reinforcement learning (RL) and large language models (LLMs), have demonstrated the potential for AI to contribute back to mathematics by offering flexible algorithmic frameworks and powerful inductive reasoning capabilities that support various aspects of mathematical research. This survey aims to establish a bridge between AI and mathematics, providing insights into the mutual benefits and fostering deeper interdisciplinary understanding. In particular, we argue that while current AI and LLMs may struggle with complex deductive reasoning, their "inherent creativity", the ability to generate outputs at high throughput based on recognition of shallow patterns, holds significant potential to support and inspire mathematical research. This creative capability, often overlooked, could be the key to unlocking new perspectives and methodologies in mathematics. Furthermore, we address the lack of cross-disciplinary communication: mathematicians may not fully comprehend the latest advances in AI, while AI researchers frequently prioritize benchmark performance over real-world applications in frontier mathematical research. This paper seeks to close that gap, offering a detailed exploration of AI fundamentals, its strengths, and its emerging applications in the mathematical sciences.

28 Aug 2025

ai-for-health computer-science machine-learning

Quantum-Classical Hybrid Molecular Autoencoder for Advancing Classical Decoding

Augusta University University of Tennessee at Chattanooga The University of Georgia

Although recent advances in quantum machine learning (QML) offer significant potential for enhancing generative models, particularly in molecular design, a large array of classical approaches still face challenges in achieving high fidelity and validity. In particular, the integration of QML with sequence-based tasks, such as Simplified Molecular Input Line Entry System (SMILES) string reconstruction, remains underexplored and usually suffers from fidelity degradation. In this work, we propose a hybrid quantum-classical architecture for SMILES reconstruction that integrates quantum encoding with classical sequence modeling to improve quantum fidelity and classical similarity. Our approach achieves a quantum fidelity of approximately 84% and a classical reconstruction similarity of 60%, surpassing existing quantum baselines. Our work lays a promising foundation for future QML applications, striking a balance between expressive quantum representations and classical sequence models and catalyzing broader research on quantum-aware sequence models for molecular and drug discovery.

01 Dec 2021

computer-science artificial-intelligence machine-learning

DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion

Northeastern University William & Mary Augusta University

Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frameworks, such as TensorFlow, TVM, and MNN. However, these frameworks usually adopt fusion approaches based on certain patterns that are too restrictive to cover the diversity of operators and layer connections. Polyhedral-based loop fusion techniques, on the other hand, work on a low-level view of the computation without operator-level information, and can also miss potential fusion opportunities. To address this challenge, this paper proposes a novel and extensive loop fusion framework called DNNFusion. The basic idea of this work is to work at an operator view of DNNs, but expand fusion opportunities by developing a classification of both individual operators and their combinations. In addition, DNNFusion includes 1) a novel mathematical-property-based graph rewriting framework to reduce evaluation costs and facilitate subsequent operator fusion, 2) an integrated fusion plan generation that leverages the high-level analysis and accurate light-weight profiling, and 3) additional optimizations during fusion code generation. DNNFusion is extensively evaluated on 15 DNN models with varied types of tasks, model sizes, and layer counts. The evaluation results demonstrate that DNNFusion finds up to 8.8x higher fusion opportunities, outperforms four state-of-the-art DNN execution frameworks with 9.3x speedup. The memory requirement reduction and speedups can enable the execution of many of the target models on mobile devices and even make them part of a real-time application.

1,834

19 May 2025

chain-of-thought computer-science machine-learning

Evaluating Mathematical Reasoning Across Large Language Models: A Fine-Grained Approach

University of Georgia Augusta University James Madison University

This study systematically assesses the mathematical reasoning capabilities of eight leading Large Language Models across three diverse benchmarks including MATH, GSM8K, and MMLU mathematical subsets. The research identifies performance variations, highlights efficiency-accuracy trade-offs, and provides detailed insights into model strengths and weaknesses in arithmetic, algebra, geometry, and formal logic.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Knowledge Distillation and Dataset Distillation of Large Language Models: Emerging Trends, Challenges, and Future Directions

Alignment and Safety in Large Language Models: Safety Mechanisms, Training Paradigms, and Emerging Challenges

A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks

MIMO: A medical vision language model with visual referring multimodal input and pixel grounding multimodal output

From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models

T2S: High-resolution Time Series Generation with Text-to-Series Diffusion Models

MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial Application

Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning

Brain-like Functional Organization within Large Language Models

Type, Ability, and Effect Systems: Perspectives on Purity, Semantics, and Expressiveness

Large Language Models for Bioinformatics

Evaluation of OpenAI o1: Opportunities and Challenges of AGI

Memory Under Siege: A Comprehensive Survey of Side-Channel Attacks on Memory

SolverLLM: Leveraging Test-Time Scaling for Optimization Problem via LLM-Guided Search

Predicting Human Brain States with Transformer

A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks

Mathematics and Machine Creativity: A Survey on Bridging Mathematics with AI

Quantum-Classical Hybrid Molecular Autoencoder for Advancing Classical Decoding

DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion

Evaluating Mathematical Reasoning Across Large Language Models: A Fine-Grained Approach

Events

AI for Law

Personalize Your Feed