alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Ask or search anything...

Events

Watch Recordings

AI for Law01/09 · Joel Niklaus · Hugging Face

Papers Benchmarks

Bradley Department of Electrical and Computer Engineering Virginia Tech logo

Virginia Tech

Enabling Efficient Attack Investigation via Human-in-the-Loop Security Analysis

03 Dec 2024

Huazhong University of Science and Technology Virginia Tech logo

Raptor, developed by researchers from Virginia Tech and the University of California, Berkeley, provides a system for efficient cybersecurity attack investigation by leveraging a Domain-Specific Language (ProvQL) and an optimized execution engine. This system allows security analysts to precisely query large-scale system provenance data, achieving an average graph reduction of 58,991x and an F1-score of 0.8766 in revealing attack sequences, while also speeding up query execution by up to 56 times compared to general-purpose query languages.

#computer-science #computer-vision-security #computation-and-language

Paper thumbnail

Commonsense for Zero-Shot Natural Language Video Localization

01 Feb 2024

University of Illinois at Urbana-Champaign Virginia Tech logo

Zero-shot Natural Language-Video Localization (NLVL) methods have exhibited promising results in training NLVL models exclusively with raw video data by dynamically generating video segments and pseudo-query annotations. However, existing pseudo-queries often lack grounding in the source video, resulting in unstructured and disjointed content. In this paper, we investigate the effectiveness of commonsense reasoning in zero-shot NLVL. Specifically, we present CORONET, a zero-shot NLVL framework that leverages commonsense to bridge the gap between videos and generated pseudo-queries via a commonsense enhancement module. CORONET employs Graph Convolution Networks (GCN) to encode commonsense information extracted from a knowledge graph, conditioned on the video, and cross-attention mechanisms to enhance the encoded video and pseudo-query representations prior to localization. Through empirical evaluations on two benchmark datasets, we demonstrate that CORONET surpasses both zero-shot and weakly supervised baselines, achieving improvements up to 32.13% across various recall thresholds and up to 6.33% in mIoU. These results underscore the significance of leveraging commonsense reasoning for zero-shot NLVL.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

14 May 2025

tianyi-zhou

Tianyi Zhou

University of Washington New York University logo

New York University

Salesforce Research and collaborators introduce BLIP3-o, a family of unified multimodal models excelling in both image understanding and generation, by systematically investigating hybrid autoregressive-diffusion architectures. The models achieve superior performance on image understanding benchmarks, such as 83.1 on VQAv2 and 83.5 on MMBench, and demonstrate high visual quality and prompt alignment in human evaluations for image generation.

#computer-science #contrastive-learning #artificial-intelligence

Resources 1,497

Paper thumbnail

BLIP3o-NEXT: Next Frontier of Native Image Generation

17 Oct 2025

New York University UC Davis

Salesforce Research, in collaboration with academic institutions, introduced BLIP3o-NEXT, an open-source autoregressive and diffusion model that unifies text-to-image generation and image editing. The model achieved superior performance on GenEval benchmarks and enhanced image editing consistency, notably improving text rendering and instruction following through reinforcement learning on discrete visual tokens.

#computer-science #computer-vision-and-pattern-recognition #data-curation

Resources 1,529

Paper thumbnail

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

18 Oct 2025

xu-zhongxing

Zhongxing Xu

South China University of Technology

California Institute of Technology

A comprehensive survey by researchers from Shanghai AI Lab and various global institutions outlines the intricate relationship between scientific large language models (Sci-LLMs) and their data foundations, tracing their evolution towards autonomous agents for scientific discovery. The paper establishes a taxonomy for scientific data and knowledge, meticulously reviews over 270 datasets and 190 benchmarks, and identifies critical data challenges alongside future paradigms.

#agentic-frameworks #agents #computer-science

Paper thumbnail

Demystifying Synthetic Data in LLM Pre-training: A Systematic Study of Scaling Laws, Benefits, and Pitfalls

02 Oct 2025

Virginia Tech FAIR at Meta

FAIR at Meta researchers systematically investigated synthetic data in LLM pre-training, discovering that mixing approximately 30% high-quality rephrased synthetic data with natural web text can accelerate pre-training convergence by 5-10x to reach the same validation loss. This approach also projects a lower irreducible loss compared to training solely on natural data, offering practical guidelines for data mixture ratios and generator model selection.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

LLM-SR: Scientific Equation Discovery via Programming with Large Language Models

20 Mar 2025

parshinshojaee

Parshin Shojaee

Carnegie Mellon University Allen Institute for AI logo

Allen Institute for AI

Researchers from Virginia Tech, Carnegie Mellon University, and the Allen Institute for AI developed LLM-SR, a framework integrating large language models with evolutionary search and numerical optimization to discover scientific equations as Python programs. This approach achieves superior accuracy and out-of-domain generalization compared to state-of-the-art symbolic regression baselines, while requiring substantially fewer iterations on novel scientific benchmarks.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

StructCoder: Structure-Aware Transformer for Code Generation

30 Jan 2024

StructCoder introduces a Transformer model for code generation that uniquely integrates Abstract Syntax Tree (AST) paths and Data Flow Graph (DFG) predictions directly into the decoder to guide the generation of structured code. This approach consistently achieves state-of-the-art performance on code translation and text-to-code generation benchmarks, including higher strict accuracy on APPS Python problems, by ensuring the generated code is both syntactically correct and semantically sound.

#computer-science #machine-learning #software-engineering

Paper thumbnail

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

03 Dec 2019

ram-prasaath-selvaraju

Ram Prasaath Selvaraju

Grad-CAM, developed at Georgia Tech and FAIR, introduces a technique for generating visual explanations from Convolutional Neural Networks by leveraging gradients flowing into the final convolutional layer. This method provides class-discriminative localization maps without requiring architectural changes or re-training of the network, making it broadly applicable to various CNN models and tasks.

#computer-science #computer-vision-security #artificial-intelligence

Paper thumbnail

SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models

11 Jun 2025

yu-min-tseng

Yu-Min Tseng

Virginia Tech researchers introduced SEALQA, a challenging benchmark for search-augmented language models, designed to evaluate reasoning and robustness in scenarios with conflicting or unhelpful search results. Evaluations on SEALQA reveal that even frontier models struggle significantly with real-world information retrieval challenges, often performing poorly or being misled by noisy data.

#agents #computer-science #artificial-intelligence

Paper thumbnail

TrustLLM: Trustworthiness in Large Language Models

30 Sep 2024

meng-jiang

Meng Jiang

Quanxin Mei

tianyi-zhou

Tianyi Zhou

Michigan State University

University of Illinois at Urbana-Champaign

The TRUSTLLM framework and benchmark offer a comprehensive system for evaluating the trustworthiness of large language models across six key dimensions. This work reveals that while proprietary models generally exhibit higher trustworthiness, open-source models can also achieve strong performance in specific areas, highlighting challenges like 'over-alignment' and data leakage.

#computer-science #computation-and-language

Paper thumbnail

Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout

25 Nov 2025

Virginia Tech fal

Researchers from Virginia Tech and fal developed Infinity-RoPE, a training-free inference framework that transforms short-horizon autoregressive Diffusion Transformers into models capable of infinite-length, action-controllable, and multi-cut video generation. This framework achieves superior temporal coherence and control in long videos, demonstrating state-of-the-art performance on benchmarks and in user studies.

#computer-science #computer-vision-and-pattern-recognition #efficient-transformers

Paper thumbnail

LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models

07 Jun 2025

khoa-doan

Khoa Doan

parshinshojaee

Parshin Shojaee

Carnegie Mellon University VinUniversity

LLM-SRBench is a new benchmark for scientific equation discovery with Large Language Models, meticulously designed to prevent memorization and rigorously assess true data-driven discovery and reasoning. The benchmark's evaluation, using four state-of-the-art LLM-based methods, reveals that current systems achieve a maximum of 31.5% symbolic accuracy, highlighting substantial room for improvement in LLMs' scientific reasoning and generalization capabilities.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features

01 Jul 2025

tuna-meral

Tuna Meral

IBM Research Georgia Tech

ConceptAttention introduces a training-free method to interpret multi-modal Diffusion Transformers by leveraging the attention output space to generate high-quality saliency maps for textual concepts. This approach sets new state-of-the-art results for zero-shot image segmentation on benchmarks like ImageNet-Segmentation (e.g., 83.07% Accuracy) and PascalVOC, and generalizes to video generation.

#computer-science #computer-vision-and-pattern-recognition #machine-learning

Paper thumbnail

Judging with Confidence: Calibrating Autoraters to Preference Distributions

30 Sep 2025

Google DeepMind Vanderbilt University

The alignment of large language models (LLMs) with human values increasingly relies on using other LLMs as automated judges, or ``autoraters''. However, their reliability is limited by a foundational issue: they are trained on discrete preference labels, forcing a single ground truth onto tasks that are often subjective, ambiguous, or nuanced. We argue that a reliable autorater must learn to model the full distribution of preferences defined by a target population. In this paper, we propose a general framework for calibrating probabilistic autoraters to any given preference distribution. We formalize the problem and present two learning methods tailored to different data conditions: 1) a direct supervised fine-tuning for dense, probabilistic labels, and 2) a reinforcement learning approach for sparse, binary labels. Our empirical results show that finetuning autoraters with a distribution-matching objective leads to verbalized probability predictions that are better aligned with the target preference distribution, with improved calibration and significantly lower positional bias, all while preserving performance on objective tasks.

#agents #computer-science #computation-and-language

Paper thumbnail

Agentic Web: Weaving the Next Web with AI Agents

28 Jul 2025

UC Berkeley University College London logo

University College London

Researchers from a consortium of global institutions articulate a vision for the 'Agentic Web,' an internet ecosystem driven by autonomous AI agents that persistently plan, coordinate, and execute goal-directed tasks. They present a comprehensive framework for this emerging paradigm, detailing necessary algorithmic and systemic transitions, potential applications, and critical considerations for safety, security, and governance.

#agentic-frameworks #agents #computer-science

Paper thumbnail

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

05 Oct 2023

Stanford University IBM Research

Researchers at Princeton, Virginia Tech, IBM Research, and Stanford University reveal that fine-tuning Large Language Models (LLMs) such as Llama-2-7b-Chat and GPT-3.5 Turbo, even with benign datasets, can significantly degrade their safety mechanisms. Their findings show that harmfulness rates can increase from near-zero to up to 90% with malicious fine-tuning and to 10-35% with commonly used benign datasets.

#adversarial-attacks #computer-science #artificial-intelligence

Paper thumbnail

Data Shapley in One Training Run

07 Jun 2025

Virginia Tech Princeton University logo

Princeton University

This paper presents In-Run Data Shapley, a framework for efficiently quantifying data contributions to a specific machine learning model during its training process. It makes Data Shapley practical for large foundation models by integrating attribution calculations directly into the training loop with minimal overhead, revealing insights into data quality, training dynamics, and the nature of data influence on generative AI outputs.

#computer-science #computation-and-language #machine-learning

Paper thumbnail

A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models

22 Feb 2025

University of Illinois at Urbana-Champaign Sun Yat-Sen University logo

Sun Yat-Sen University

This survey provides a comprehensive review of mechanistic interpretability methods for Multimodal Foundation Models (MMFMs), presenting a new taxonomy to organize current research. The work highlights that while some interpretability techniques from LLMs can be adapted, novel methods are required to understand unique multimodal processing, and identifies key research gaps in areas such as unified benchmarks and scalable causal understanding.

#computer-science #artificial-intelligence #machine-learning

Paper thumbnail

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

09 Jan 2025

jianwei-yang

Jianwei Yang

University of Pennsylvania

This paper introduces REFOCUS, a framework that enhances multimodal LLMs' structured image understanding through iterative visual editing and reasoning

#attention-mechanisms #computer-science #computer-vision-security

Paper thumbnail

There are no more papers matching your filters at the moment.