alphaXiv

History

Papers Benchmarks

Wand AI

10,167

15 Jun 2025

computer-science artificial-intelligence computation-and-language

Layer by Layer: Uncovering Hidden Representations in Language Models

UCLA

Université de Montréal

New York University

Meta University of Montreal University of Kentucky Wand AI

Intermediate layers in large language models often provide superior representations for downstream tasks compared to final layers, yielding 2-16% performance improvements on average across 32 text embedding tasks. This work introduces a unified framework based on matrix-based entropy to explain this phenomenon and enable unsupervised identification of optimal layers.

7,737

21 Nov 2025

chain-of-thought computer-science computation-and-language

Concise Reasoning via Reinforcement Learning

Wand AI

A major drawback of reasoning models is their excessive token usage, inflating computational cost, resource demand, and latency. We show this verbosity stems not from deeper reasoning but from reinforcement learning loss minimization when models produce incorrect answers. With unsolvable problems dominating training, this effect compounds into a systematic tendency toward longer outputs. Through theoretical analysis of PPO and GRPO, we prove that incorrect answers inherently drive policies toward verbosity \textit{even when}

\gamma=1

, reframing response lengthening as an optimization artifact. We further uncover a consistent correlation between conciseness and correctness across reasoning and non-reasoning models. Building on these insights, we propose a two-phase RL procedure where a brief secondary stage, trained on a small set of solvable problems, significantly reduces response length while preserving or improving accuracy. Finally, we show that while GRPO shares properties with PPO, it exhibits collapse modes, limiting its reliability for concise reasoning. Our claims are supported by extensive experiments.

2,129

12 Dec 2024

computer-science computation-and-language machine-learning

Does Representation Matter? Exploring Intermediate Layers in Large Language Models

Université de Montréal

New York University

Meta University of Kentucky Wand AI

This research quantifies and characterizes representations in intermediate layers of Large Language Models, demonstrating that these layers often provide superior feature quality for downstream tasks compared to traditional final layers. The study highlights significant differences in information processing patterns between Transformer and State Space Model architectures, revealing a correlation between intermediate layer information compression and improved task performance in Transformers.

31 Oct 2024

computer-science artificial-intelligence computation-and-language

OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset

University of New South Wales

New York University

UC Berkeley

Stanford University

Texas A&M University University of Texas Wand AI Oracle Corporation Howard Community College

We introduce OpenDebateEvidence, a comprehensive dataset for argument mining and summarization sourced from the American Competitive Debate community. This dataset includes over 3.5 million documents with rich metadata, making it one of the most extensive collections of debate evidence. OpenDebateEvidence captures the complexity of arguments in high school and college debates, providing valuable resources for training and evaluation. Our extensive experiments demonstrate the efficacy of fine-tuning state-of-the-art large language models for argumentative abstractive summarization across various methods, models, and datasets. By providing this comprehensive resource, we aim to advance computational argumentation and support practical applications for debaters, educators, and researchers. OpenDebateEvidence is publicly available to support further research and innovation in computational argumentation. Access it here: this https URL

10 Oct 2024

computer-science information-theory machine-learning

Learning to Compress: Local Rank and Information Compression in Deep Neural Networks

UCLA

New York University Wand AI

Deep neural networks tend to exhibit a bias toward low-rank solutions during training, implicitly learning low-dimensional feature representations. This paper investigates how deep multilayer perceptrons (MLPs) encode these feature manifolds and connects this behavior to the Information Bottleneck (IB) theory. We introduce the concept of local rank as a measure of feature manifold dimensionality and demonstrate, both theoretically and empirically, that this rank decreases during the final phase of training. We argue that networks that reduce the rank of their learned representations also compress mutual information between inputs and intermediate layers. This work bridges the gap between feature manifold rank and information compression, offering new insights into the interplay between information bottlenecks and representation learning.

07 Jan 2025

computer-science conversational-ai computation-and-language

"Yeah Right!" -- Do LLMs Exhibit Multimodal Feature Transfer?

Georgia Tech Wand AI

Human communication is a multifaceted and multimodal skill. Communication requires an understanding of both the surface-level textual content and the connotative intent of a piece of communication. In humans, learning to go beyond the surface level starts by learning communicative intent in speech. Once humans acquire these skills in spoken communication, they transfer those skills to written communication. In this paper, we assess the ability of speech+text models and text models trained with special emphasis on human-to-human conversations to make this multimodal transfer of skill. We specifically test these models on their ability to detect covert deceptive communication. We find that with no special prompting speech+text LLMs have an advantage over unimodal LLMs in performing this task. Likewise, we find that human-to-human conversation-trained LLMs are also advantaged in this skill.

08 Oct 2025

computer-science artificial-intelligence machine-learning

NdLinear: Preserving Multi-Dimensional Structure for Parameter-Efficient Neural Networks

New York University

Columbia University Weizmann Institute of Science Wand AI Ensemble AI Thoughtworks

In deep learning, processing multidimensional inputs (e.g., images, medical scans, and time series) is an important task that often requires flattening the inputs. We introduce

\mathit{NdLinear}

, a drop-in replacement for linear layers that operates directly on tensors, requiring no flattening. By applying transformations separately along each dimension, NdLinear preserves native data structure while achieving dramatic parameter reductions, often by orders of magnitude, with minimal memory overhead. We prove NdLinear maintains expressivity through structured Tucker decomposition while preserving VC-dimension scaling. Extensive experiments demonstrate NdLinear's capacity to achieve significant parameter reductions with substantial wall-clock efficiency gains and minimal memory overhead. For instance, our

\mathit{NdLinear-LoRA}

matches or exceeds standard LoRA on language reasoning tasks using up to

9\times

fewer parameters. Experiments across CNNs, RNNs, Transformers, and MLPs on vision, language, time-series, and tabular tasks consistently demonstrate NdLinear's efficiency gains. While excelling at axis-separable tasks, NdLinear has limitations with entangled spatial interactions. By processing data in its original N-dimensional form, NdLinear provides a theoretically grounded, practical component for building more efficient neural architectures.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Layer by Layer: Uncovering Hidden Representations in Language Models

Concise Reasoning via Reinforcement Learning

Does Representation Matter? Exploring Intermediate Layers in Large Language Models

OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset

Learning to Compress: Local Rank and Information Compression in Deep Neural Networks

"Yeah Right!" -- Do LLMs Exhibit Multimodal Feature Transfer?

NdLinear: Preserving Multi-Dimensional Structure for Parameter-Efficient Neural Networks

Events

AI for Law

Personalize Your Feed