alphaXiv

History

Papers Benchmarks

Rensselaer Polytechnic Institute

2,029

05 Dec 2024

mathematics optimization-and-control

Sensor-Driven Predictive Vehicle Maintenance and Routing Problem with Time Windows

Rensselaer Polytechnic Institute Wayne State University

Advancements in sensor technology offer significant insights into vehicle conditions, unlocking new venues to enhance fleet operations. While current vehicle health management models provide accurate predictions of vehicle failures, they often fail to integrate these forecasts into operational decision-making, limiting their practical impact. This paper addresses this gap by incorporating sensor-driven failure predictions into a single-vehicle routing problem with time windows. A maintenance cost function is introduced to balance two critical trade-offs: premature maintenance, which leads to underutilization of remaining useful life, and delayed maintenance, which increases the likelihood of breakdowns. Routing problems with time windows are inherently challenging, and integrating maintenance considerations adds significantly to its computational complexity. To address this, we develop a new solution method, called the Iterative Alignment Method (IAM), building on the structural properties of the problem. IAM generates high-quality solutions even in large-size instances where Gurobi cannot find any solutions. Moreover, compared to the traditional periodic maintenance strategy, our sensor-driven approach to maintenance decisions shows improvements in operational and maintenance costs as well as in overall vehicle reliability.

2,019

16 Jun 2024

computer-science conversational-ai artificial-intelligence

Large Language Models for Automatic Milestone Detection in Group Discussions

Rensselaer Polytechnic Institute

Northeastern University

Researchers from Rensselaer Polytechnic Institute and Northeastern University leveraged Large Language Models, specifically GPT-4, to automatically identify milestone achievements in transcripts of group discussions. GPT-4 significantly enhanced detection accuracy for specific milestones, such as increasing "dual" from 25% to 90% and "quadruple" from 65% to 93.5% compared to BERT-based semantic similarity, yet faced challenges with non-deterministic outputs and formatting inconsistencies.

1,508

12 Jun 2023

computer-science artificial-intelligence computation-and-language

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

ETH Zurich

KAIST

University of Washington Rensselaer Polytechnic Institute

Google DeepMind

University of Amsterdam

University of Illinois at Urbana-Champaign

University of Cambridge Heidelberg University

University of Waterloo Facebook

Carnegie Mellon University

University of Southern California

Google

New York University University of Stuttgart

UC Berkeley

National University of Singapore

University College London

University of Oxford LMU Munich

Shanghai Jiao Tong University

University of California, Irvine

Tsinghua University

Stanford University

University of Michigan

University of Copenhagen

The Chinese University of Hong Kong University of Melbourne

Meta University of Edinburgh

OpenAI

The University of Texas at Austin

Cornell University

University of California, San Diego Yonsei University

McGill University

Boston University University of Bamberg

Nanyang Technological University

Microsoft

KU Leuven

Columbia University UC Santa Barbara

Allen Institute for AI German Research Center for Artificial Intelligence (DFKI)

University of Pennsylvania

Johns Hopkins University

Arizona State University

University of Maryland

University of Tokyo University of North Carolina at Chapel Hill Hebrew University of Jerusalem Amazon Tilburg University University of Massachusetts Amherst University of Rochester University of Duisburg-Essen Sapienza University of Rome University of Sheffield

Princeton University

HKUST University of Tübingen TU Berlin Saarland University Technical University of Darmstadt University of Haifa University of Trento University of Montreal Bilkent University University of Cape Town Bar Ilan University IBM University of Mannheim

ServiceNow Potsdam University Polish-Japanese Academy of Information Technology Salesforce ASAPP AI21 Labs Valencia Polytechnic University University of Trento, Italy

Allen Nie

Jos Rozen

+13

A large-scale and diverse benchmark, BIG-bench, was introduced to rigorously evaluate the capabilities and limitations of large language models across 204 tasks. The evaluation revealed that even state-of-the-art models currently achieve aggregate scores below 20 (on a 0-100 normalized scale), indicating significantly lower performance compared to human experts.

1,363

28 Dec 2024

computer-science artificial-intelligence computation-and-language

A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness

Rensselaer Polytechnic Institute

University of Pennsylvania

The Pennsylvania State University Amazon UTHealth Houston

Suhang Wang

Fali Wang

A comprehensive survey outlines the landscape of Small Language Models (SLMs) in the context of Large Language Models (LLMs), defining SLMs based on task capability and resource constraints. It details their acquisition, enhancement techniques, diverse applications, synergistic collaboration with LLMs, and critical trustworthiness considerations to enable efficient and privacy-preserving AI.

204

185

28 Sep 2025

computer-science machine-learning multiagent-systems

Code2MCP: Transforming Code Repositories into MCP Services

Rensselaer Polytechnic Institute

Sun Yat-Sen University Southeast University

Code2MCP, an agent-based framework, automatically converts GitHub repositories into standardized Model Context Protocol (MCP) services through a multi-agent workflow and a self-correction loop. This system significantly accelerates the creation of agent-ready tools, enabling AI agents to access complex functionalities from existing codebases for tasks like scientific computing.

1,203

29 Sep 2025

computer-science artificial-intelligence computation-and-language

SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching

Rensselaer Polytechnic Institute

University of Waterloo

Ali Falahati

SentenceKV introduces a method for efficient Large Language Model inference by using sentence-level semantic KV caching, significantly reducing GPU memory usage and maintaining stable, low latency while preserving accuracy for contexts up to 256K tokens. It demonstrated 97.5% retrieval accuracy on the Needle In A Haystack benchmark with a 128-token budget.

712

07 Jun 2025

computer-science computational-engineering-finance-and-science computation-and-language

Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Rensselaer Polytechnic Institute Wuhan University Sichuan University

New York University

Nanjing University The Chinese University of Hong Kong, Shenzhen

Yale University

NVIDIA

Columbia University

University of Florida

Stony Brook University The University of Manchester Stevens Institute of Technology University of Montreal

Steven zhang

Toby YANG

Financial LLMs hold promise for advancing financial tasks and domain-specific applications. However, they are limited by scarce corpora, weak multimodal capabilities, and narrow evaluations, making them less suited for real-world application. To address this, we introduce \textit{Open-FinLLMs}, the first open-source multimodal financial LLMs designed to handle diverse tasks across text, tabular, time-series, and chart data, excelling in zero-shot, few-shot, and fine-tuning settings. The suite includes FinLLaMA, pre-trained on a comprehensive 52-billion-token corpus; FinLLaMA-Instruct, fine-tuned with 573K financial instructions; and FinLLaVA, enhanced with 1.43M multimodal tuning pairs for strong cross-modal reasoning. We comprehensively evaluate Open-FinLLMs across 14 financial tasks, 30 datasets, and 4 multimodal tasks in zero-shot, few-shot, and supervised fine-tuning settings, introducing two new multimodal evaluation datasets. Our results show that Open-FinLLMs outperforms afvanced financial and general LLMs such as GPT-4, across financial NLP, decision-making, and multi-modal tasks, highlighting their potential to tackle real-world challenges. To foster innovation and collaboration across academia and industry, we release all codes (this https URL) and models under OSI-approved licenses.

299

03 Nov 2025

chain-of-thought computer-science computation-and-language

Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking

Rensselaer Polytechnic Institute

University of Toronto Pennsylvania State University Griffith University AT&T-Chief Data Office

Researchers from Rensselaer Polytechnic Institute and collaborators audit existing Knowledge Graph Question Answering (KGQA) datasets, revealing an average factual correctness of only 57%. They introduce KGQAGen, an LLM-guided framework for creating high-quality, verifiable benchmarks, and use it to construct KGQAGen-10k, which achieves 96.3% factual accuracy and highlights retrieval as a primary bottleneck for state-of-the-art KG-RAG models.

431

11 Sep 2021

computer-science artificial-intelligence databases

Knowledge Graphs

Rensselaer Polytechnic Institute University of Bonn

University of Southampton Universität Stuttgart

Rutgers University Sapienza University of Rome Universidad de Chile Linköping University Vrije Universiteit Universidad de Oviedo University of Bari Universität Paderborn WU Vienna Universität Koblenz–Landau data.world École des mines de Saint-Étienne University of Milano Bicocca

This collaborative tutorial from 18 leading researchers provides a comprehensive, unifying summary of knowledge graphs, consolidating fragmented knowledge from diverse fields. It defines core concepts, surveys techniques across data models, knowledge representation, and AI, and outlines lifecycle management and governance for knowledge graphs.

2,832

385

14 May 2024

computer-science computation-and-language computer-vision-and-pattern-recognition

BioCLIP: A Vision Foundation Model for the Tree of Life

Rensselaer Polytechnic Institute

University of California, Irvine

The Ohio State University

Yu Su

Li Dong

BIOCLIP is a vision foundation model that significantly improves fine-grained biological image classification and generalization to unseen taxa by leveraging a novel 10-million image dataset (TREEOFLIFE-10M) and adapting CLIP's contrastive learning to incorporate the hierarchical structure of biological taxonomy. It achieves an average absolute improvement of 17.5% over original CLIP in zero-shot classification across 10 diverse biology tasks and 38.0% zero-shot accuracy on a novel RARE SPECIES dataset of unseen taxa.

14,413

08 May 2025

agent-based-systems computer-science artificial-intelligence

Foam-Agent: Towards Automated Intelligent CFD Workflows

Rensselaer Polytechnic Institute

University of California, San Diego

A multi-agent framework called Foam-Agent automates OpenFOAM-based CFD simulations through natural language inputs, achieving an 83.6% success rate with Claude 3.5 Sonnet through specialized agents handling architecture planning, configuration generation, simulation execution, and error correction while maintaining consistency across interdependent files.

463

05 Mar 2025

agents ai-for-genomics ai-for-health

DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration

Rensselaer Polytechnic Institute

Carnegie Mellon University

University of Southern California

DrugAgent is a multi-agent LLM framework that automates machine learning programming for drug discovery tasks, integrating deep domain knowledge and intelligent planning. The system consistently outperforms general-purpose LLM agents and often matches human expert baselines across ADMET, HTS, and DTI prediction tasks, exhibiting zero domain-specific errors due to its specialized architecture.

1,397

25 May 2025

computer-science machine-learning multi-task-learning

When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers

Rensselaer Polytechnic Institute

Michigan State University IBM Research New Jersey Institute of Technology

Researchers from Rensselaer Polytechnic Institute, Michigan State University, New Jersey Institute of Technology, and IBM Research present the first theoretical generalization analysis of task arithmetic on nonlinear Transformer models. The work explains why task vectors are effective for model editing across multi-task learning, unlearning, and out-of-domain generalization, while also proving their inherent low-rank and sparse properties.

02 Oct 2025

computer-science information-retrieval

Contrastive Retrieval Heads Improve Attention-Based Re-Ranking

Rensselaer Polytechnic Institute IBM Research

Researchers at Rensselaer Polytechnic Institute and IBM Research introduce "Contrastive Retrieval heads" (CoRe heads) for attention-based re-ranking, a method that selectively aggregates attention signals from a small, highly discriminative subset of transformer heads to enhance the accuracy and efficiency of Large Language Model (LLM) re-rankers. This approach consistently achieved state-of-the-art zero-shot performance across diverse benchmarks, including multilingual datasets, while reducing GPU memory usage by 40% and inference latency by 20% through strategic layer pruning.

30 Sep 2025

agentic-frameworks agents computer-science

Foam-Agent 2.0: An End-to-End Composable Multi-Agent Framework for Automating CFD Simulation in OpenFOAM

Rensselaer Polytechnic Institute

University of California, San Diego

Computational Fluid Dynamics (CFD) is an essential simulation tool in engineering, yet its steep learning curve and complex manual setup create significant barriers. To address these challenges, we introduce Foam-Agent, a multi-agent framework that automates the entire end-to-end OpenFOAM workflow from a single natural language prompt. Our key innovations address critical gaps in existing systems: 1. An Comprehensive End-to-End Simulation Automation: Foam-Agent is the first system to manage the full simulation pipeline, including advanced pre-processing with a versatile Meshing Agent capable of handling external mesh files and generating new geometries via Gmsh, automatic generation of HPC submission scripts, and post-simulation visualization via ParaView. 2. Composable Service Architecture: Going beyond a monolithic agent, the framework uses Model Context Protocol (MCP) to expose its core functions as discrete, callable tools. This allows for flexible integration and use by other agentic systems, such as Claude-code, for more exploratory workflows. 3. High-Fidelity Configuration Generation: We achieve superior accuracy through a Hierarchical Multi-Index RAG for precise context retrieval and a dependency-aware generation process that ensures configuration consistency. Evaluated on a benchmark of 110 simulation tasks, Foam-Agent achieves an 88.2% success rate with Claude 3.5 Sonnet, significantly outperforming existing frameworks (55.5% for MetaOpenFOAM). Foam-Agent dramatically lowers the expertise barrier for CFD, demonstrating how specialized multi-agent systems can democratize complex scientific computing. The code is public at this https URL.

216

07 Apr 2025

attention-mechanisms computer-science computer-vision-security

Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis

Rensselaer Polytechnic Institute

Virginia Tech

Princeton University

The Ohio State University University of Tsukuba

Yu Su

PROMPT-CAM introduces class-specific prompts into frozen Vision Transformers to accurately localize fine-grained visual traits that differentiate visually similar categories. The method demonstrates superior interpretability, achieving higher faithfulness scores and enabling human experts to recognize 60.49% of identified traits, even with a slight trade-off in classification accuracy.

367

07 Apr 2025

agent-based-systems ai-for-health computer-science

DrugAgent: Multi-Agent Large Language Model-Based Reasoning for Drug-Target Interaction Prediction

Rensselaer Polytechnic Institute

University of Minnesota

Advancements in large language models (LLMs) allow them to address diverse questions using human-like interfaces. Still, limitations in their training prevent them from answering accurately in scenarios that could benefit from multiple perspectives. Multi-agent systems allow the resolution of questions to enhance result consistency and reliability. While drug-target interaction (DTI) prediction is important for drug discovery, existing approaches face challenges due to complex biological systems and the lack of interpretability needed for clinical applications. DrugAgent is a multi-agent LLM system for DTI prediction that combines multiple specialized perspectives with transparent reasoning. Our system adapts and extends existing multi-agent frameworks by (1) applying coordinator-based architecture to the DTI domain, (2) integrating domain-specific data sources, including ML predictions, knowledge graphs, and literature evidence, and (3) incorporating Chain-of-Thought (CoT) and ReAct (Reason+Act) frameworks for transparent DTI reasoning. We conducted comprehensive experiments using a kinase inhibitor dataset, where our multi-agent LLM method outperformed the non-reasoning multi-agent model (GPT-4o mini) by 45% in F1 score (0.514 vs 0.355). Through ablation studies, we demonstrated the contributions of each agent, with the AI agent being the most impactful, followed by the KG agent and search agent. Most importantly, our approach provides detailed, human-interpretable reasoning for each prediction by combining evidence from multiple sources - a critical feature for biomedical applications where understanding the rationale behind predictions is essential for clinical decision-making and regulatory compliance. Code is available at this https URL

731

19 Feb 2025

agentic-frameworks agents computer-science

FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading

Rensselaer Polytechnic Institute

Harvard University

University of Manchester

NVIDIA

Columbia University

University of Minnesota Stevens Institute of Technology TheFinAI

FLAG-TRADER introduces a framework integrating Large Language Models with gradient-based Reinforcement Learning for financial trading, where the LLM acts as the policy network. The framework enables small-scale, open-source LLMs to achieve superior performance in single-asset trading tasks compared to larger, proprietary models by effectively leveraging RL-based fine-tuning for sequential decision-making.

10 Oct 2025

agents computer-science artificial-intelligence

CFDLLMBench: A Benchmark Suite for Evaluating Large Language Models in Computational Fluid Dynamics

Rensselaer Polytechnic Institute

University of California, San Diego Pacific Northwest National Laboratory Indian Institute of Science National Renewable Energy Laboratory

Large Language Models (LLMs) have demonstrated strong performance across general NLP tasks, but their utility in automating numerical experiments of complex physical system -- a critical and labor-intensive component -- remains underexplored. As the major workhorse of computational science over the past decades, Computational Fluid Dynamics (CFD) offers a uniquely challenging testbed for evaluating the scientific capabilities of LLMs. We introduce CFDLLMBench, a benchmark suite comprising three complementary components -- CFDQuery, CFDCodeBench, and FoamBench -- designed to holistically evaluate LLM performance across three key competencies: graduate-level CFD knowledge, numerical and physical reasoning of CFD, and context-dependent implementation of CFD workflows. Grounded in real-world CFD practices, our benchmark combines a detailed task taxonomy with a rigorous evaluation framework to deliver reproducible results and quantify LLM performance across code executability, solution accuracy, and numerical convergence behavior. CFDLLMBench establishes a solid foundation for the development and evaluation of LLM-driven automation of numerical experiments for complex physical systems. Code and data are available at this https URL.

172

11 Oct 2024

adversarial-robustness computer-science artificial-intelligence

SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

Rensselaer Polytechnic Institute IBM Research

Fine-tuning on task-specific data to boost downstream performance is a crucial step for leveraging Large Language Models (LLMs). However, previous studies have demonstrated that fine-tuning the models on several adversarial samples or even benign data can greatly comprise the model's pre-equipped alignment and safety capabilities. In this work, we propose SEAL, a novel framework to enhance safety in LLM fine-tuning. SEAL learns a data ranker based on the bilevel optimization to up rank the safe and high-quality fine-tuning data and down rank the unsafe or low-quality ones. Models trained with SEAL demonstrate superior quality over multiple baselines, with 8.5% and 9.7% win rate increase compared to random selection respectively on Llama-3-8b-Instruct and Merlinite-7b models. Our code is available on github this https URL.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Sensor-Driven Predictive Vehicle Maintenance and Routing Problem with Time Windows

Large Language Models for Automatic Milestone Detection in Group Discussions

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness

Code2MCP: Transforming Code Repositories into MCP Services

SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching

Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking

Knowledge Graphs

BioCLIP: A Vision Foundation Model for the Tree of Life

Foam-Agent: Towards Automated Intelligent CFD Workflows

DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration

When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers

Contrastive Retrieval Heads Improve Attention-Based Re-Ranking

Foam-Agent 2.0: An End-to-End Composable Multi-Agent Framework for Automating CFD Simulation in OpenFOAM

Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis

DrugAgent: Multi-Agent Large Language Model-Based Reasoning for Drug-Target Interaction Prediction

FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading

CFDLLMBench: A Benchmark Suite for Evaluating Large Language Models in Computational Fluid Dynamics

SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

Events

AI for Law

Personalize Your Feed