alphaXiv

History

Papers Benchmarks

Worcester Polytechnic Institute

9,890

01 Aug 2025

computer-science artificial-intelligence computation-and-language

A Survey on Post-training of Large Language Models

Michigan State University

University of Illinois at Urbana-Champaign

University of Georgia Lehigh University

The University of Hong Kong

Huazhong University of Science and Technology Salesforce Research University of Illinois at Chicago

Duke University Jilin University

Southern University of Science and Technology Worcester Polytechnic Institute LinkedIn Corporation Squirrel Ai Learning

qin chen

This survey offers the first comprehensive review of Post-training Language Models (PoLMs), systematically classifying methods, datasets, and applications within a novel intellectual framework. It traces the evolution of LLMs across five core paradigms—Fine-tuning, Alignment, Reasoning, Efficiency, and Integration & Adaptation—and identifies critical future research directions.

905

31 Dec 2024

attention-mechanisms computer-science conversational-ai

MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation

University of Utah

Texas A&M University University of Houston Worcester Polytechnic Institute Visa Research

MAIN-RAG is a training-free, multi-agent LLM framework designed to filter noisy documents in Retrieval-Augmented Generation (RAG) systems. It consistently outperforms training-free baselines and achieves competitive performance with training-based RAG models by using an adaptive filtering mechanism that quantifies document relevance based on LLM judgments.

312

05 Apr 2025

computer-science artificial-intelligence machine-learning

Foundation Models for Time Series: A Survey

Worcester Polytechnic Institute Dell Technologies University of Massachusetts Lowell

This survey provides a comprehensive analysis and taxonomy of foundation models developed for time series analysis, primarily focusing on transformer-based architectures. It details their architectural adaptations, training paradigms, and applications across forecasting, anomaly detection, classification, and imputation tasks, highlighting their ability to learn generalizable representations from large datasets.

23 Oct 2025

computer-science sound audio-and-speech-processing

Continuous-Token Diffusion for Speaker-Referenced TTS in Multimodal LLMs

Worcester Polytechnic Institute Amazon AGI

Unified architectures in multimodal large language models (MLLM) have shown promise in handling diverse tasks within a single framework. In the text-to-speech (TTS) task, current MLLM-based approaches rely on discrete token representations, which disregard the inherently continuous nature of speech and can lead to loss of fine-grained acoustic information. In this work, we investigate the TTS within the MLLM paradigm using continuous speech representations. We design a dual-head architecture and implement two complementary training strategies for a robust model. (1) A diffusion head generating continuous speech representations is added on the MLLM, which is on frame-level and strictly autoregressive. (2) The original language model head is retained to preserve multitask capability and to control the start and end of speech synthesis. (3) Masked training is employed to address exposure bias in autoregressive decoding. (4) To stabilize optimization, we propose a two-stage scheme where the LM is frozen in the second stage, ensuring the diffusion head learns from a fixed input distribution. Evaluations on LibriSpeech(PC) test-clean show that our approach achieves state-of-the-art autoregressive performance, with a WER of 1.95%, speaker similarity of 0.54, and UTMOS of 4.00. The two-stage training yields a 46% relative WER reduction over the one-stage training baseline. These results highlight the effectiveness of combining autoregressive modeling with continuous-token diffusion, supported by a two-stage training procedure.

115

11 Aug 2025

computer-science computer-vision-and-pattern-recognition inference-optimization

CATP: Contextually Adaptive Token Pruning for Efficient and Enhanced Multimodal In-Context Learning

University of Southern California

University of Bristol

Rutgers University

Brown University MIT-IBM Watson AI Lab Worcester Polytechnic Institute

Zhennan Shen

Modern large vision-language models (LVLMs) convert each input image into a large set of tokens, far outnumbering the text tokens. Although this improves visual perception, it introduces severe image token redundancy. Because image tokens carry sparse information, many add little to reasoning, yet greatly increase inference cost. The emerging image token pruning methods tackle this issue by identifying the most important tokens and discarding the rest. These methods can raise efficiency with only modest performance loss. However, most of them only consider single-image tasks and overlook multimodal in-context learning (ICL), where redundancy is greater and efficiency is more critical. Redundant tokens weaken the advantage of multimodal ICL for rapid domain adaptation and cause unstable performance. Applying existing pruning methods in this setting leads to large accuracy drops, exposing a clear gap and the need for new techniques. Thus, we propose Contextually Adaptive Token Pruning (CATP), a training-free pruning method targeted at multimodal ICL. CATP consists of two stages that perform progressive pruning to fully account for the complex cross-modal interactions in the input sequence. After removing 77.8\% of the image tokens, CATP produces an average performance gain of 0.6\% over the vanilla model on four LVLMs and eight benchmarks, exceeding all baselines remarkably. Meanwhile, it effectively improves efficiency by achieving an average reduction of 10.78\% in inference latency. CATP enhances the practical value of multimodal ICL and lays the groundwork for future progress in interleaved image-text scenarios.

213

29 Aug 2025

computer-science artificial-intelligence cryptography-and-security

RevPRAG: Revealing Poisoning Attacks in Retrieval-Augmented Generation through LLM Activation Analysis

Fudan University Worcester Polytechnic Institute

RevPRAG introduces an automated pipeline that analyzes Large Language Model activations to detect poisoning attacks in Retrieval-Augmented Generation systems. This method consistently achieves high detection accuracy (over 97% true positive rate) with low false positive rates, effectively identifying compromised responses across diverse RAG configurations and attack types.

307

29 Jun 2025

ai-for-health computer-science computation-and-language

Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge

University of Toronto Vector Institute Worcester Polytechnic Institute

An agentic framework dynamically constructs and updates Medical Knowledge Graphs using LLMs and external search tools to enhance medical question answering. This approach, called AMG-RAG, leverages structured knowledge and Chain-of-Thought reasoning to achieve an F1 score of 74.1% on MEDQA, outperforming models significantly larger in size and demonstrating superior performance in rapidly evolving medical subfields.

260

13 May 2025

computer-science information-retrieval

FlippedRAG: Black-Box Opinion Manipulation Adversarial Attacks to Retrieval-Augmented Generation Models

Wuhan University Worcester Polytechnic Institute

Researchers from Wuhan University and Worcester Polytechnic Institute developed FlippedRAG, a black-box adversarial attack that manipulates opinions in Retrieval-Augmented Generation (RAG) models on controversial topics. The method achieves this by first transparentizing the black-box retriever and then injecting a limited number of adversarially poisoned documents, successfully shifting the opinion polarity of RAG-generated responses and influencing user perceptions while bypassing current defenses.

127

14 Jan 2025

bayesian-optimization computer-science computation-and-language

Personalized LLM Response Generation with Parameterized Memory Injection

Alibaba Group Worcester Polytechnic Institute

The MiLP framework enhances personalized Large Language Model response generation by directly injecting parameterized user memory into the LLM's Feed Forward Layers via LoRA modules, with their optimal configurations discovered through Bayesian Optimization. This method consistently yields improved ROUGE-L and Persona F1 scores compared to current personalization techniques, demonstrated across various datasets and LLM architectures.

199

01 Nov 2024

computer-science machine-learning combinatorics

PatternBoost: Constructions in Mathematics with a Little Help from AI

Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval-Augmented Generation Models

Wuhan University Worcester Polytechnic Institute Indiana University Bloomington

Retrieval-Augmented Generation (RAG) systems based on Large Language Models (LLMs) have become essential for tasks such as question answering and content generation. However, their increasing impact on public opinion and information dissemination has made them a critical focus for security research due to inherent vulnerabilities. Previous studies have predominantly addressed attacks targeting factual or single-query manipulations. In this paper, we address a more practical scenario: topic-oriented adversarial opinion manipulation attacks on RAG models, where LLMs are required to reason and synthesize multiple perspectives, rendering them particularly susceptible to systematic knowledge poisoning. Specifically, we propose Topic-FlipRAG, a two-stage manipulation attack pipeline that strategically crafts adversarial perturbations to influence opinions across related queries. This approach combines traditional adversarial ranking attack techniques and leverages the extensive internal relevant knowledge and reasoning capabilities of LLMs to execute semantic-level perturbations. Experiments show that the proposed attacks effectively shift the opinion of the model's outputs on specific topics, significantly impacting user information perception. Current mitigation methods cannot effectively defend against such attacks, highlighting the necessity for enhanced safeguards for RAG systems, and offering crucial insights for LLM security research.

12 Oct 2025

agent-based-systems computer-science conversational-ai

AssoMem: Scalable Memory QA with Multi-Signal Associative Retrieval

Meta Reality Labs Worcester Polytechnic Institute

Accurate recall from large scale memories remains a core challenge for memory augmented AI assistants performing question answering (QA), especially in similarity dense scenarios where existing methods mainly rely on semantic distance to the query for retrieval. Inspired by how humans link information associatively, we propose AssoMem, a novel framework constructing an associative memory graph that anchors dialogue utterances to automatically extracted clues. This structure provides a rich organizational view of the conversational context and facilitates importance aware ranking. Further, AssoMem integrates multi-dimensional retrieval signals-relevance, importance, and temporal alignment using an adaptive mutual information (MI) driven fusion strategy. Extensive experiments across three benchmarks and a newly introduced dataset, MeetingQA, demonstrate that AssoMem consistently outperforms SOTA baselines, verifying its superiority in context-aware memory recall.

263

26 Mar 2025

chain-of-thought computer-science computation-and-language

IHEval: Evaluating Language Models on Following the Instruction Hierarchy

University of Notre Dame Amazon Worcester Polytechnic Institute

The instruction hierarchy, which establishes a priority order from system messages to user messages, conversation history, and tool outputs, is essential for ensuring consistent and safe behavior in language models (LMs). Despite its importance, this topic receives limited attention, and there is a lack of comprehensive benchmarks for evaluating models' ability to follow the instruction hierarchy. We bridge this gap by introducing IHEval, a novel benchmark comprising 3,538 examples across nine tasks, covering cases where instructions in different priorities either align or conflict. Our evaluation of popular LMs highlights their struggle to recognize instruction priorities. All evaluated models experience a sharp performance decline when facing conflicting instructions, compared to their original instruction-following performance. Moreover, the most competitive open-source model only achieves 48% accuracy in resolving such conflicts. Our results underscore the need for targeted optimization in the future development of LMs.

280

27 Feb 2025

computer-science computation-and-language distributed-learning

Privacy-preserved LLM Cascade via CoT-enhanced Policy Learning

Worcester Polytechnic Institute Google AIR

Large Language Models (LLMs) have gained significant attention in on-device applications due to their remarkable performance across real-world tasks. However, on-device LLMs often suffer from suboptimal performance due to hardware limitations. A promising solution to this challenge is cascading a weaker local (on-device) LLM with a more powerful server LLM. While existing research on LLM cascade primarily optimizes the performance-cost trade-off, real-world applications impose additional requirements, such as privacy preservation, which remain largely unaddressed. In this work, we move beyond existing confidence- and logit-based LLM cascade methods and propose

\mathbf{P^{3}Defer}

, a novel Chain-of-Thought (CoT)-enhanced \textbf{p}olicy learning framework for \textbf{p}rivacy-\textbf{p}reserved \textbf{defer}ral decision-making. Our approach effectively improves cascade efficiency while mitigating privacy risks. Extensive experiments on three benchmark datasets demonstrate the effectiveness and superiority of

\mathbf{P^{3}Defer}

over existing methods.

106

22 May 2025

computer-science machine-learning domain-adaptation

UrbanMind: Urban Dynamics Prediction with Multifaceted Spatial-Temporal Large Language Models

University of Electronic Science and Technology of China Worcester Polytechnic Institute Binghamton University San Diego State University R&D Centre Logistics and Supply Chain MultiTech

Understanding and predicting urban dynamics is crucial for managing transportation systems, optimizing urban planning, and enhancing public services. While neural network-based approaches have achieved success, they often rely on task-specific architectures and large volumes of data, limiting their ability to generalize across diverse urban scenarios. Meanwhile, Large Language Models (LLMs) offer strong reasoning and generalization capabilities, yet their application to spatial-temporal urban dynamics remains underexplored. Existing LLM-based methods struggle to effectively integrate multifaceted spatial-temporal data and fail to address distributional shifts between training and testing data, limiting their predictive reliability in real-world applications. To bridge this gap, we propose UrbanMind, a novel spatial-temporal LLM framework for multifaceted urban dynamics prediction that ensures both accurate forecasting and robust generalization. At its core, UrbanMind introduces Muffin-MAE, a multifaceted fusion masked autoencoder with specialized masking strategies that capture intricate spatial-temporal dependencies and intercorrelations among multifaceted urban dynamics. Additionally, we design a semantic-aware prompting and fine-tuning strategy that encodes spatial-temporal contextual details into prompts, enhancing LLMs' ability to reason over spatial-temporal patterns. To further improve generalization, we introduce a test time adaptation mechanism with a test data reconstructor, enabling UrbanMind to dynamically adjust to unseen test data by reconstructing LLM-generated embeddings. Extensive experiments on real-world urban datasets across multiple cities demonstrate that UrbanMind consistently outperforms state-of-the-art baselines, achieving high accuracy and robust generalization, even in zero-shot settings.

148

07 Jul 2025

computer-science computers-and-society

Stop treating `AGI' as the north-star goal of AI research

University of Illinois at Urbana-Champaign

Google

University of Chicago

Hugging Face Rochester Institute of Technology Tufts University University of Connecticut Worcester Polytechnic Institute Conservatoire National des Arts et Métiers Eberhard-Karls-Universität Tübingen AI Risk and Vulnerability Alliance Data & Society Research Institute Vijil cole Polytechnique

Researchers from a diverse group of academic institutions and industry labs critically examine the pervasive influence of Artificial General Intelligence (AGI) as a guiding principle in AI research. Their analysis reveals how an AGI focus exacerbates issues such as a lack of scientific rigor, masked values, and exclusion of diverse stakeholders. The paper advocates for alternative, more effective goal-setting strategies that prioritize specificity, pluralism, and inclusion, ultimately aiming to re-center AI development around supporting and benefiting human beings.

08 Oct 2025

computer-science cryptography-and-security

EMPalm: Exfiltrating Palm Biometric Data via Electromagnetic Side-Channels

Worcester Polytechnic Institute Florida International University

Palm recognition has emerged as a dominant biometric authentication technology in critical infrastructure. These systems operate in either single-modal form, using palmprint or palmvein individually, or dual-modal form, fusing the two modalities. Despite this diversity, they share similar hardware architectures that inadvertently emit electromagnetic (EM) signals during operation. Our research reveals that these EM emissions leak palm biometric information, motivating us to develop EMPalm--an attack framework that covertly recovers both palmprint and palmvein images from eavesdropped EM signals. Specifically, we first separate the interleaved transmissions of the two modalities, identify and combine their informative frequency bands, and reconstruct the images. To further enhance fidelity, we employ a diffusion model to restore fine-grained biometric features unique to each domain. Evaluations on seven prototype and two commercial palm acquisition devices show that EMPalm can recover palm biometric information with high visual fidelity, achieving SSIM scores up to 0.79, PSNR up to 29.88 dB, and FID scores as low as 6.82 across all tested devices, metrics that collectively demonstrate strong structural similarity, high signal quality, and low perceptual discrepancy. To assess the practical implications of the attack, we further evaluate it against four state-of-the-art palm recognition models, achieving a model-wise average spoofing success rate of 65.30% over 6,000 samples from 100 distinct users.

23 Oct 2025

ai-for-health computer-science conversational-ai

LeCoDe: A Benchmark Dataset for Interactive Legal Consultation Dialogue Evaluation

Alibaba Group

Zhejiang University Worcester Polytechnic Institute

Legal consultation is essential for safeguarding individual rights and ensuring access to justice, yet remains costly and inaccessible to many individuals due to the shortage of professionals. While recent advances in Large Language Models (LLMs) offer a promising path toward scalable, low-cost legal assistance, current systems fall short in handling the interactive and knowledge-intensive nature of real-world consultations. To address these challenges, we introduce LeCoDe, a real-world multi-turn benchmark dataset comprising 3,696 legal consultation dialogues with 110,008 dialogue turns, designed to evaluate and improve LLMs' legal consultation capability. With LeCoDe, we innovatively collect live-streamed consultations from short-video platforms, providing authentic multi-turn legal consultation dialogues. The rigorous annotation by legal experts further enhances the dataset with professional insights and expertise. Furthermore, we propose a comprehensive evaluation framework that assesses LLMs' consultation capabilities in terms of (1) clarification capability and (2) professional advice quality. This unified framework incorporates 12 metrics across two dimensions. Through extensive experiments on various general and domain-specific LLMs, our results reveal significant challenges in this task, with even state-of-the-art models like GPT-4 achieving only 39.8% recall for clarification and 59% overall score for advice quality, highlighting the complexity of professional consultation scenarios. Based on these findings, we further explore several strategies to enhance LLMs' legal consultation abilities. Our benchmark contributes to advancing research in legal domain dialogue systems, particularly in simulating more real-world user-expert interactions.

115

27 Feb 2025

computer-science computation-and-language

KEDRec-LM: A Knowledge-distilled Explainable Drug Recommendation Large Language Model

Yale University

University of California, Davis Worcester Polytechnic Institute The University of Texas Health Science Center at Houston

Drug discovery is a critical task in biomedical natural language processing (NLP), yet explainable drug discovery remains underexplored. Meanwhile, large language models (LLMs) have shown remarkable abilities in natural language understanding and generation. Leveraging LLMs for explainable drug discovery has the potential to improve downstream tasks and real-world applications. In this study, we utilize open-source drug knowledge graphs, clinical trial data, and PubMed publications to construct a comprehensive dataset for the explainable drug discovery task, named \textbf{expRxRec}. Furthermore, we introduce \textbf{KEDRec-LM}, an instruction-tuned LLM which distills knowledge from rich medical knowledge corpus for drug recommendation and rationale generation. To encourage further research in this area, we will publicly release\footnote{A copy is attached with this submission} both the dataset and KEDRec-LM.

03 Aug 2025

computer-science machine-learning model-interpretation

KANMixer: Can KAN Serve as a New Modeling Core for Long-term Time Series Forecasting?

Tohoku University

University of Michigan

Texas A&M University Worcester Polytechnic Institute San Diego State University

In recent years, multilayer perceptrons (MLP)-based deep learning models have demonstrated remarkable success in long-term time series forecasting (LTSF). Existing approaches typically augment MLP backbones with hand-crafted external modules to address the inherent limitations of their flat architectures. Despite their success, these augmented methods neglect hierarchical locality and sequential inductive biases essential for time-series modeling, and recent studies indicate diminishing performance improvements. To overcome these limitations, we explore Kolmogorov-Arnold Networks (KAN), a recently proposed model featuring adaptive basis functions capable of granular, local modulation of nonlinearities. This raises a fundamental question: Can KAN serve as a new modeling core for LTSF? To answer this, we introduce KANMixer, a concise architecture integrating a multi-scale mixing backbone that fully leverages KAN's adaptive capabilities. Extensive evaluation demonstrates that KANMixer achieves state-of-the-art performance in 16 out of 28 experiments across seven benchmark datasets. To uncover the reasons behind this strong performance, we systematically analyze the strengths and limitations of KANMixer in comparison with traditional MLP architectures. Our findings reveal that the adaptive flexibility of KAN's learnable basis functions significantly transforms the influence of network structural prior on forecasting performance. Furthermore, we identify critical design factors affecting forecasting accuracy and offer practical insights for effectively utilizing KAN in LTSF. Together, these insights constitute the first empirically grounded guidelines for effectively leveraging KAN in LTSF. Code is available in the supplementary file.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring