alphaXiv

History

Papers Benchmarks

University of Houston

2,007

01 Feb 2024

computer-science computation-and-language cryptography-and-security

Domain-Independent Deception: A New Taxonomy and Linguistic Analysis

Tel Aviv University University of Houston University of California Berkeley Instabase

Xuting Liu

Verma et al. introduce a new computational definition and a comprehensive taxonomy for deception, offering a foundational framework for domain-independent detection. Their work provides empirical evidence through linguistic analysis and deep learning experiments that generalizable linguistic cues for deception exist and can be transferred across diverse domains, challenging previous skepticism in the field.

865

08 Jul 2025

materials-science physics

Strong Temperature Dependence of Thermal Conductivity in High-Purity Cubic Boron Arsenide

University of Illinois at Urbana-Champaign

UCLA University of Houston University of California Riverside

Materials with high thermal conductivity are needed to conduct heat away from hot spots in high power electronics and optoelectronic devices. Cubic boron arsenide (c-BAs) has a high thermal conductivity due to its special phonon dispersion relation. Previous experimental studies of c-BAs report a room-temperature thermal conductivity between 1000 and 1300 W m-1 K-1. We synthesized high purity isotopically enriched c-BAs single crystals with room-temperature thermal conductivity of around 1500 W m-1 K-1. Using time-domain thermoreflectance (TDTR), we measured thermal conductivity and found a 1/T2 temperature dependence between 300 K and 600 K - slightly stronger than predictions from state-of-the-art theoretical models. Brillouin and Raman scattering revealed minimal changes in phonon frequencies over the same temperature range, suggesting that the observed 1/T2 dependence is not caused by temperature dependent changes in phonon dispersion. To probe defect densities in the BAs crystals we studied, we conducted transient reflectivity microscopy (TRM) measurements of absorption at sub-bandgap photon energies. We observe a correlation between TRM signal intensity and thermal conductivity. Notably, samples with thermal conductivity near 1500 W m-1 K-1 still exhibited nonzero TRM signals, suggesting the presence of defects despite the high thermal conductivity.

3,756

18 Apr 2025

computer-science artificial-intelligence machine-learning

A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications

University of Houston Sichuan Agricultural University

This work provides an exhaustive, up-to-date survey of Mixture-of-Experts (MoE) models, covering their fundamental designs, algorithmic integrations across various machine learning paradigms, theoretical foundations, and diverse applications in computer vision and natural language processing. It consolidates recent advancements to serve as a comprehensive reference for the rapidly evolving field.

171,929

17 Mar 2025

computer-science machine-learning

Multi-modal Time Series Analysis: A Tutorial and Survey

University of Houston NEC Laboratories America University of Connecticut Morgan Stanley

A comprehensive survey and tutorial from researchers at UConn, University of Houston, Morgan Stanley, and NEC Labs introduces a unified framework for multi-modal time series analysis, categorizing methods based on fusion, alignment, and transference while providing a systematic review of datasets, techniques, and applications across different domains.

905

31 Dec 2024

attention-mechanisms computer-science conversational-ai

MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation

University of Utah

Texas A&M University University of Houston Worcester Polytechnic Institute Visa Research

MAIN-RAG is a training-free, multi-agent LLM framework designed to filter noisy documents in Retrieval-Augmented Generation (RAG) systems. It consistently outperforms training-free baselines and achieves competitive performance with training-based RAG models by using an adaptive filtering mechanism that quantifies document relevance based on LLM judgments.

143

21 May 2025

computer-science contrastive-learning computer-vision-and-pattern-recognition

Image-to-Image Translation with Diffusion Transformers and CLIP-Based Image Conditioning

Northeastern University

Cornell University University of Houston Santa Clara University

This research develops a CLIP-conditioned Diffusion Transformer (DiT) for paired image-to-image translation, leveraging CLIP image embeddings for semantic guidance. The framework generates images with superior quality, sharpness, and detail preservation compared to GAN-based methods, particularly on larger datasets.

250

19 Feb 2025

computer-science continual-learning artificial-intelligence

Theory on Mixture-of-Experts in Continual Learning

Singapore University of Technology and Design University of Houston

The Ohio State University

This paper provides the first explicit theoretical analysis of Mixture-of-Experts (MoE) models in continual learning (CL), demonstrating how MoE mitigates catastrophic forgetting through expert specialization and identifying that early termination of gating network updates is crucial for stable performance.

31 Oct 2025

computer-science artificial-intelligence machine-learning

Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting

University of Houston NEC Laboratories America University of Connecticut

ChengAo Shen

The DMMV framework transforms numerical time series into multi-modal views (numerical and visual) and integrates them via an adaptive decomposition strategy to improve long-term forecasting. This method specifically accounts for the inductive biases of Large Vision Models (LVMs) and sets a new state-of-the-art by achieving the best Mean Squared Error on 6 of 8 benchmark datasets.

138

01 Jul 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation

University of Washington

Allen Institute for AI University of Houston

ROBOEVAL presents a diagnostic benchmark for bimanual robotic manipulation, featuring 8 tiered tasks with over 3,000 human demonstrations and a multi-dimensional evaluation framework. This approach reveals that fine-grained behavioral and outcome metrics provide a substantially deeper understanding of policy limitations and failure modes than traditional binary success rates.

13,952

19 Feb 2025

computer-science networking-and-internet-architecture

Service Function Chain Dynamic Scheduling in Space-Air-Ground Integrated Networks

Nanjing University of Aeronautics and Astronautics Xidian University China University of Mining and Technology

Nanyang Technological University Kyung Hee University University of Houston

As an important component of the sixth generation communication technologies, the space-air-ground integrated network (SAGIN) attracts increasing attentions in recent years. However, due to the mobility and heterogeneity of the components such as satellites and unmanned aerial vehicles in multi-layer SAGIN, the challenges of inefficient resource allocation and management complexity are aggregated. To this end, the network function virtualization technology is introduced and can be implemented via service function chains (SFCs) deployment. However, urgent unexpected tasks may bring conflicts and resource competition during SFC deployment, and how to schedule the SFCs of multiple tasks in SAGIN is a key issue. In this paper, we address the dynamic and complexity of SAGIN by presenting a reconfigurable time extension graph and further propose the dynamic SFC scheduling model. Then, we formulate the SFC scheduling problem to maximize the number of successful deployed SFCs within limited resources and time horizons. Since the problem is in the form of integer linear programming and intractable to solve, we propose the algorithm by incorporating deep reinforcement learning. Finally, simulation results show that the proposed algorithm has better convergence and performance compared to other benchmark algorithms.

210

30 Aug 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

Harnessing Vision Models for Time Series Analysis: A Survey

University of Illinois at Urbana-Champaign University of Houston NEC Laboratories America University of Connecticut Florida International University

ChengAo Shen

Time series analysis has witnessed the inspiring development from traditional autoregressive models, deep learning models, to recent Transformers and Large Language Models (LLMs). Efforts in leveraging vision models for time series analysis have also been made along the way but are less visible to the community due to the predominant research on sequence modeling in this domain. However, the discrepancy between continuous time series and the discrete token space of LLMs, and the challenges in explicitly modeling the correlations of variates in multivariate time series have shifted some research attentions to the equally successful Large Vision Models (LVMs) and Vision Language Models (VLMs). To fill the blank in the existing literature, this survey discusses the advantages of vision models over LLMs in time series analysis. It provides a comprehensive and in-depth overview of the existing methods, with dual views of detailed taxonomy that answer the key research questions including how to encode time series as images and how to model the imaged time series for various tasks. Additionally, we address the challenges in the pre- and post-processing steps involved in this framework and outline future directions to further advance time series analysis with vision models.

13 Aug 2025

computer-science computer-vision-and-pattern-recognition

ViewDelta: Scaling Scene Change Detection through Text-Conditioning

University of Houston

We introduce a generalized framework for Scene Change Detection (SCD) that addresses the core ambiguity of distinguishing "relevant" from "nuisance" changes, enabling effective joint training of a single model across diverse domains and applications. Existing methods struggle to generalize due to differences in dataset labeling, where changes such as vegetation growth or lane marking alterations may be labeled as relevant in one dataset and irrelevant in another. To resolve this ambiguity, we propose ViewDelta, a text conditioned change detection framework that uses natural language prompts to define relevant changes precisely, such as a single attribute, a specific set of classes, or all observable differences. To facilitate training in this paradigm, we release the Conditional Change Segmentation dataset (CSeg), the first large-scale synthetic dataset for text conditioned SCD, consisting of over 500,000 image pairs with more than 300,000 unique textual prompts describing relevant changes. Experiments demonstrate that a single ViewDelta model trained jointly on CSeg, SYSU-CD, PSCD, VL-CMU-CD, and their unaligned variants achieves performance competitive with or superior to dataset specific models, highlighting text conditioning as a powerful approach for generalizable SCD. Our code and dataset are available at this https URL.

30 May 2025

agentic-frameworks agents computer-science

ComposeRAG: A Modular and Composable RAG for Corpus-Grounded Multi-Hop Question Answering

Seoul National University University of Houston Snowflake AI Research

Retrieval-Augmented Generation (RAG) systems are increasingly diverse, yet many suffer from monolithic designs that tightly couple core functions like query reformulation, retrieval, reasoning, and verification. This limits their interpretability, systematic evaluation, and targeted improvement, especially for complex multi-hop question answering. We introduce ComposeRAG, a novel modular abstraction that decomposes RAG pipelines into atomic, composable modules. Each module, such as Question Decomposition, Query Rewriting, Retrieval Decision, and Answer Verification, acts as a parameterized transformation on structured inputs/outputs, allowing independent implementation, upgrade, and analysis. To enhance robustness against errors in multi-step reasoning, ComposeRAG incorporates a self-reflection mechanism that iteratively revisits and refines earlier steps upon verification failure. Evaluated on four challenging multi-hop QA benchmarks, ComposeRAG consistently outperforms strong baselines in both accuracy and grounding fidelity. Specifically, it achieves up to a 15% accuracy improvement over fine-tuning-based methods and up to a 5% gain over reasoning-specialized pipelines under identical retrieval conditions. Crucially, ComposeRAG significantly enhances grounding: its verification-first design reduces ungrounded answers by over 10% in low-quality retrieval settings, and by approximately 3% even with strong corpora. Comprehensive ablation studies validate the modular architecture, demonstrating distinct and additive contributions from each component. These findings underscore ComposeRAG's capacity to deliver flexible, transparent, scalable, and high-performing multi-hop reasoning with improved grounding and interpretability.

331

16 Jun 2023

computer-science artificial-intelligence distributed-parallel-and-cluster-computing

ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

Microsoft University of Houston University of Nevada, Reno

Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of large language models on massive GPUs clusters due to its ease of use, efficiency, and good scalability. However, when training on low-bandwidth clusters, or at scale which forces batch size per GPU to be small, ZeRO's effective throughput is limited because of high communication volume from gathering weights in forward pass, backward pass, and averaging gradients. This paper introduces three communication volume reduction techniques, which we collectively refer to as ZeRO++, targeting each of the communication collectives in ZeRO. First is block-quantization based all-gather. Second is data remapping that trades-off communication for more memory. Third is a novel all-to-all based quantized gradient averaging paradigm as replacement of reduce-scatter collective, which preserves accuracy despite communicating low precision data. Collectively, ZeRO++ reduces communication volume of ZeRO by 4x, enabling up to 2.16x better throughput at 384 GPU scale.

24 Jun 2025

computer-science artificial-intelligence distributed-parallel-and-cluster-computing

Research on Model Parallelism and Data Parallelism Optimization Methods in Large Language Model-Based Recommendation Systems

Northeastern University

Columbia University University of Houston Fuzhou University University of Nebraska-Lincoln

With the rapid adoption of large language models (LLMs) in recommendation systems, the computational and communication bottlenecks caused by their massive parameter sizes and large data volumes have become increasingly prominent. This paper systematically investigates two classes of optimization methods-model parallelism and data parallelism-for distributed training of LLMs in recommendation scenarios. For model parallelism, we implement both tensor parallelism and pipeline parallelism, and introduce an adaptive load-balancing mechanism to reduce cross-device communication overhead. For data parallelism, we compare synchronous and asynchronous modes, combining gradient compression and sparsification techniques with an efficient aggregation communication framework to significantly improve bandwidth utilization. Experiments conducted on a real-world recommendation dataset in a simulated service environment demonstrate that our proposed hybrid parallelism scheme increases training throughput by over 30% and improves resource utilization by approximately 20% compared to traditional single-mode parallelism, while maintaining strong scalability and robustness. Finally, we discuss trade-offs among different parallel strategies in online deployment and outline future directions involving heterogeneous hardware integration and automated scheduling technologies.

122

13 Aug 2025

agents autonomous-vehicles computer-science

Edge General Intelligence Through World Models and Agentic AI: Fundamentals, Solutions, and Challenges

University of Waterloo Sungkyunkwan University Xidian University Guangdong University of Technology

Nanyang Technological University University of Houston Agency for Science, Technology and Research Institute for Infocomm Research State Key Laboratory of Integrated Services Networks

Edge General Intelligence (EGI) represents a transformative evolution of edge computing, where distributed agents possess the capability to perceive, reason, and act autonomously across diverse, dynamic environments. Central to this vision are world models, which act as proactive internal simulators that not only predict but also actively imagine future trajectories, reason under uncertainty, and plan multi-step actions with foresight. This proactive nature allows agents to anticipate potential outcomes and optimize decisions ahead of real-world interactions. While prior works in robotics and gaming have showcased the potential of world models, their integration into the wireless edge for EGI remains underexplored. This survey bridges this gap by offering a comprehensive analysis of how world models can empower agentic artificial intelligence (AI) systems at the edge. We first examine the architectural foundations of world models, including latent representation learning, dynamics modeling, and imagination-based planning. Building on these core capabilities, we illustrate their proactive applications across EGI scenarios such as vehicular networks, unmanned aerial vehicle (UAV) networks, the Internet of Things (IoT) systems, and network functions virtualization, thereby highlighting how they can enhance optimization under latency, energy, and privacy constraints. We then explore their synergy with foundation models and digital twins, positioning world models as the cognitive backbone of EGI. Finally, we highlight open challenges, such as safety guarantees, efficient training, and constrained deployment, and outline future research directions. This survey provides both a conceptual foundation and a practical roadmap for realizing the next generation of intelligent, autonomous edge systems.

2,036

07 Feb 2017

computer-science machine-learning information-extraction

Gated Multimodal Units for Information Fusion

University of Houston Instituto Nacional de Astrofísica, Óptica y Electrónica Universidad Nacional de Colombia

John Arevalo

Researchers from Universidad Nacional de Colombia, University of Houston, and INAOE developed Gated Multimodal Units (GMUs) for integrating heterogeneous data, achieving a macro f-score of 0.541 on a new MM-IMDb dataset for movie genre classification, outperforming baseline fusion methods. The GMU demonstrated an ability to adaptively weigh modality contributions based on input, improving predictions for 16 out of 23 genres.

197

04 Apr 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

AdaCM $^2$ : On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction

University of Georgia University of Houston UT Dallas UT Arlington

AdaCM^2 introduces an adaptive cross-modality memory reduction framework for understanding extremely long-term videos. It processes unbounded video lengths with a bounded memory footprint by dynamically pruning visual tokens based on their relevance to textual queries, reducing GPU memory consumption by up to 65% and achieving improved accuracy on long-term video understanding and captioning benchmarks.

161

31 Jul 2025

adversarial-attacks ai-for-cybersecurity computer-science

Robust and Fine-Grained Detection of AI Generated Texts

UCLA

Stanford University

University of Maryland, College Park University of Houston IISc, Bangalore University of South Florida IIT Roorkee Cohere for AI Community Traversaal.ai Vantager Pulchowk Campus M2ai.in

An ideal detection system for machine generated content is supposed to work well on any generator as many more advanced LLMs come into existence day by day. Existing systems often struggle with accurately identifying AI-generated content over shorter texts. Further, not all texts might be entirely authored by a human or LLM, hence we focused more over partial cases i.e human-LLM co-authored texts. Our paper introduces a set of models built for the task of token classification which are trained on an extensive collection of human-machine co-authored texts, which performed well over texts of unseen domains, unseen generators, texts by non-native speakers and those with adversarial inputs. We also introduce a new dataset of over 2.4M such texts mostly co-authored by several popular proprietary LLMs over 23 languages. We also present findings of our models' performance over each texts of each domain and generator. Additional findings include comparison of performance against each adversarial method, length of input texts and characteristics of generated texts compared to the original human authored texts.

15 Jul 2025

ai-for-health computer-science artificial-intelligence

LLM-Augmented Symptom Analysis for Cardiovascular Disease Risk Prediction: A Clinical NLP

Columbia University University of Houston University of California Riverside

Timely identification and accurate risk stratification of cardiovascular disease (CVD) remain essential for reducing global mortality. While existing prediction models primarily leverage structured data, unstructured clinical notes contain valuable early indicators. This study introduces a novel LLM-augmented clinical NLP pipeline that employs domain-adapted large language models for symptom extraction, contextual reasoning, and correlation from free-text reports. Our approach integrates cardiovascular-specific fine-tuning, prompt-based inference, and entity-aware reasoning. Evaluations on MIMIC-III and CARDIO-NLP datasets demonstrate improved performance in precision, recall, F1-score, and AUROC, with high clinical relevance (kappa = 0.82) assessed by cardiologists. Challenges such as contextual hallucination, which occurs when plausible information contracts with provided source, and temporal ambiguity, which is related with models struggling with chronological ordering of events are addressed using prompt engineering and hybrid rule-based verification. This work underscores the potential of LLMs in clinical decision support systems (CDSS), advancing early warning systems and enhancing the translation of patient narratives into actionable risk assessments.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Domain-Independent Deception: A New Taxonomy and Linguistic Analysis

Strong Temperature Dependence of Thermal Conductivity in High-Purity Cubic Boron Arsenide

A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications

Multi-modal Time Series Analysis: A Tutorial and Survey

MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation

Image-to-Image Translation with Diffusion Transformers and CLIP-Based Image Conditioning

Theory on Mixture-of-Experts in Continual Learning

Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting

RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation

Service Function Chain Dynamic Scheduling in Space-Air-Ground Integrated Networks

Harnessing Vision Models for Time Series Analysis: A Survey

ViewDelta: Scaling Scene Change Detection through Text-Conditioning

ComposeRAG: A Modular and Composable RAG for Corpus-Grounded Multi-Hop Question Answering

ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

Research on Model Parallelism and Data Parallelism Optimization Methods in Large Language Model-Based Recommendation Systems

Edge General Intelligence Through World Models and Agentic AI: Fundamentals, Solutions, and Challenges

Gated Multimodal Units for Information Fusion

AdaCM $^2$ : On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction

Robust and Fine-Grained Detection of AI Generated Texts

LLM-Augmented Symptom Analysis for Cardiovascular Disease Risk Prediction: A Clinical NLP

Events

AI for Law

Personalize Your Feed

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Domain-Independent Deception: A New Taxonomy and Linguistic Analysis

Strong Temperature Dependence of Thermal Conductivity in High-Purity Cubic Boron Arsenide

A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications

Multi-modal Time Series Analysis: A Tutorial and Survey

MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation

Image-to-Image Translation with Diffusion Transformers and CLIP-Based Image Conditioning

Theory on Mixture-of-Experts in Continual Learning

Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting

RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation

Service Function Chain Dynamic Scheduling in Space-Air-Ground Integrated Networks

Harnessing Vision Models for Time Series Analysis: A Survey

ViewDelta: Scaling Scene Change Detection through Text-Conditioning

ComposeRAG: A Modular and Composable RAG for Corpus-Grounded Multi-Hop Question Answering

ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

Research on Model Parallelism and Data Parallelism Optimization Methods in Large Language Model-Based Recommendation Systems

Edge General Intelligence Through World Models and Agentic AI: Fundamentals, Solutions, and Challenges

Gated Multimodal Units for Information Fusion

AdaCM2^22: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction

Robust and Fine-Grained Detection of AI Generated Texts

LLM-Augmented Symptom Analysis for Cardiovascular Disease Risk Prediction: A Clinical NLP

Events

AI for Law

Personalize Your Feed

AdaCM $^2$ : On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction