alphaXiv

History

Papers Benchmarks

Intel LabsIntel Corporation

2,014

30 Jan 2024

economics theoretical-economics

Coordinating Resource Allocation during Product Transitions Using a Multifollower Bilevel Programming Model

MIT Intel Corporation NC State University

We study the management of product transitions in a semiconductor manufacturing firm that requires the coordination of resource allocation decisions by multiple, autonomous Product Divisions using a multi-follower bilevel model to capture the hierarchical and decentralized nature of this decision process. Corporate management, acting as the leader, seeks to maximize the firm's total profit over a finite horizon. The followers consist of multiple Product Divisions that must share manufacturing and engineering resources to develop, produce and sell products in the market. Each Product Division needs engineering capacity to develop new products, and factory capacity to produce products for sale while also producing the prototypes and samples needed for the product development process. We model this interdependency between Product Divisions as a generalized Nash equilibrium problem at the lower level and propose a reformulation where Corporate Management acts as the leader to coordinate the resource allocation decisions. We then derive an equivalent single-level reformulation and develop a cut-and-column generation algorithm. Extensive computational experiments evaluate the performance of the algorithm and provide managerial insights on how key parameters and the distribution of decision authority affect system performance.

1,803

26 Sep 2021

attention-mechanisms computer-science computer-vision-and-pattern-recognition

Point Transformer

University of Oxford

The Chinese University of Hong Kong Intel Labs

The Point Transformer adapts Transformer networks for 3D point clouds, achieving new state-of-the-art performance by introducing vector self-attention and a trainable relative position encoding. This architecture notably reached 70.4% mIoU on S3DIS semantic segmentation and 93.7% accuracy on ModelNet40 classification.

667

305

05 Dec 2025

agentic-frameworks agents computer-science

Stronger-MAS: Multi-Agent Reinforcement Learning for Collaborative LLMs

University of California, San Diego Intel Corporation

Multi-agent systems (MAS) and reinforcement learning (RL) are widely used to enhance the agentic capabilities of large language models (LLMs). MAS improves task performance through role-based orchestration, while RL uses environmental rewards to learn stronger policies, such as GRPO-style optimization. However, applying on-policy RL to MAS remains underexplored and presents unique challenges. Algorithmically, standard GRPO grouping assumptions break down because prompts vary by role and by turn. System-wise, the training stack must support MAS-workflow rollouts and on-policy updates for both single-policy and multi-policy models. We propose AT-GRPO, which includes (i) an agent- and turn-wise grouped RL algorithm tailored to MAS and (ii) a training system that supports both single- and multi-policy regimes. Across game, planning, coding, and math tasks, AT-GRPO delivers substantial gains. On long-horizon planning, it increases accuracy from a 14.0 to 47.0 percent single-agent RL baseline to 96.0 to 99.5 percent. It also improves reasoning performance, with average gains of 3.87 to 7.62 percent on coding tasks and 9.0 to 17.93 percent on math. Code and environments are available at: this https URL.

352

07 May 2025

computer-science computer-vision-and-pattern-recognition graphics

Image-GS: Content-Adaptive Image Representation via 2D Gaussians

New York University Intel Corporation

Neural image representations have emerged as a promising approach for encoding and rendering visual data. Combined with learning-based workflows, they demonstrate impressive trade-offs between visual fidelity and memory footprint. Existing methods in this domain, however, often rely on fixed data structures that suboptimally allocate memory or compute-intensive implicit models, hindering their practicality for real-time graphics applications. Inspired by recent advancements in radiance field rendering, we introduce Image-GS, a content-adaptive image representation based on 2D Gaussians. Leveraging a custom differentiable renderer, Image-GS reconstructs images by adaptively allocating and progressively optimizing a group of anisotropic, colored 2D Gaussians. It achieves a favorable balance between visual fidelity and memory efficiency across a variety of stylized images frequently seen in graphics workflows, especially for those showing non-uniformly distributed features and in low-bitrate regimes. Moreover, it supports hardware-friendly rapid random access for real-time usage, requiring only 0.3K MACs to decode a pixel. Through error-guided progressive optimization, Image-GS naturally constructs a smooth level-of-detail hierarchy. We demonstrate its versatility with several applications, including texture compression, semantics-aware compression, and joint image compression and restoration.

363

501

28 Oct 2019

computer-science machine-learning efficient-transformers

Deep Equilibrium Models

Carnegie Mellon University Intel Labs Bosch Center for AI

Deep Equilibrium Models propose a new paradigm for neural networks where the output is an equilibrium point of a repeated transformation, enabling networks of 'infinite depth.' This approach reduces training memory consumption to a constant, making it feasible to train very deep models that previously exceeded hardware limits.

740

1,717

25 Sep 2025

computer-science computer-vision-security computer-vision-and-pattern-recognition

MonSter++: Unified Stereo Matching, Multi-view Stereo, and Real-time Stereo with Monodepth Priors

Huazhong University of Science and Technology Intel Labs Autel Robotics

Researchers from Huazhong University of Science and Technology, in collaboration with Meta and Autel Robotics, developed MonSter++, a unified geometric foundation model for multi-view depth estimation that enhances accuracy by adaptively recovering per-pixel scale and shift for monocular depth priors. The model achieves top ranks on 8 leaderboards, including a 15.91% improvement in EPE on Scene Flow, and a real-time variant runs at over 20 FPS while maintaining state-of-the-art accuracy.

686

19 Apr 2018

computer-science artificial-intelligence computation-and-language

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Carnegie Mellon University Intel Labs

A generic Temporal Convolutional Network (TCN) architecture is empirically shown to consistently outperform canonical recurrent neural networks (RNNs) across a broad spectrum of sequence modeling tasks, demonstrating superior practical memory retention and faster convergence.

345

12 Mar 2025

computer-science computer-vision-and-pattern-recognition fine-tuning

PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop

New York University Intel Labs

Xichen Pan

Researchers from New York University and Intel Labs developed the PISA benchmark and applied Physics Supervised Fine-Tuning (PSFT) and Object Reward Optimization (ORO) to improve the physical accuracy of video diffusion models when simulating falling objects. Their post-training approach, using a relatively small simulated dataset, significantly reduced trajectory errors and produced more physically plausible motion compared to state-of-the-art baselines like Open-Sora.

299

26 Mar 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

MC-LLaVA: Multi-Concept Personalized Vision-Language Model

Tianjin University

ByteDance

Peking University Intel Labs Xian Jiaotong University

MC-LLaVA introduces the first multi-concept personalization paradigm for Vision-Language Models, enabling them to understand and generate responses involving multiple user-defined concepts simultaneously. The model achieves superior recognition accuracy (0.845 for multi-concept) and competitive VQA performance (BLEU 0.658) on a newly contributed dataset, outperforming prior single-concept approaches.

140

624

24 Jun 2024

computer-science computer-vision-and-pattern-recognition

LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models

Intel Labs

LVLM-Interpret is an interactive tool for Large Vision-Language Models (LVLMs) that integrates attention visualization, relevancy mapping, and causal interpretation methods. It helps diagnose model issues like hallucination by illustrating how specific image patches and text tokens contribute to generated responses. A case study with LLaVA-v1.5-7b demonstrated the model's inconsistent grounding, revealing instances of both text-over-image reliance and robust visual understanding.

103

420

09 Feb 2017

computer-science machine-learning deep-reinforcement-learning

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

Northwestern University Intel Corporation

Large mini-batch training for deep learning models consistently converges to "sharp" minimizers of the loss function, leading to a degradation in generalization performance, while small mini-batch methods converge to "flat" minimizers with better generalization due to the inherent noise in their gradient estimations. The work provides numerical evidence and a conceptual framework explaining this generalization gap, demonstrating that large-batch solutions are orders of magnitude sharper.

138

418

03 Apr 2022

computer-science computation-and-language computer-vision-and-pattern-recognition

Language-driven Semantic Segmentation

University of Copenhagen

Cornell University Intel Labs

Apple

Researchers from Cornell, University of Copenhagen, Apple, and Intel Labs developed LSeg, a model that allows semantic segmentation based on natural language descriptions, offering flexible, zero-shot recognition of categories. LSeg outperforms prior zero-shot methods on benchmarks such as PASCAL-5i and COCO-20i, and rivals few-shot techniques while maintaining competitive accuracy on traditional fixed-label tasks.

753

846

25 Aug 2020

computer-science computer-vision-security computer-vision-and-pattern-recognition

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer

ETH Zurich Intel Labs

Researchers from Intel Labs and ETH Zurich developed a robust training framework for monocular depth estimation that effectively combines diverse and incompatible depth datasets. Their methodology, featuring scale- and shift-invariant losses and multi-objective dataset mixing, achieved state-of-the-art performance in zero-shot cross-dataset transfer, enabling superior generalization across varied real-world environments.

4,706

233

10 Oct 2025

agent-based-systems computer-science artificial-intelligence

OrcaLoca: An LLM Agent Framework for Software Issue Localization

University of California, San Diego Intel Corporation

Bill Zhang

仲明于

OrcaLoca, an LLM agent framework developed by UCSD and Intel, enhances software issue localization through strategic code exploration, action decomposition, and distance-aware context pruning. It establishes new open-source benchmarks on SWE-bench Lite, achieving a 65.33% function match rate and increasing the overall issue resolved rate by 6.33 percentage points.

101

02 Sep 2025

computer-science artificial-intelligence computation-and-language

Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation

University of Southern California Intel Labs

Large language models (LLMs), despite their impressive performance across a wide range of tasks, often struggle to balance two competing objectives in open-ended text generation: fostering diversity and creativity while preserving logical coherence. Existing truncated sampling techniques, including temperature scaling, top-\

p\

(nucleus) sampling, and min-\

p\

sampling, aim to manage this trade-off. However, they exhibit limitations, particularly in the effective incorporation of the confidence of the model into the corresponding sampling strategy. For example, min-\

p\

sampling relies on a single top token as a heuristic for confidence, eventually underutilizing the information of the probability distribution. Toward effective incorporation of the confidence of the model, in this paper, we present **top-H** decoding. We first establish the theoretical foundation of the interplay between creativity and coherence in truncated sampling by formulating an **entropy-constrained minimum divergence** problem. We then prove this minimization problem to be equivalent to an **entropy-constrained mass maximization** (ECMM) problem, which is NP-hard. Finally, we present top-H decoding, a computationally efficient greedy algorithm to solve the ECMM problem. Extensive empirical evaluations demonstrate that top-H outperforms the state-of-the-art (SoTA) alternative of min-\

p\

sampling by up to **25.63%** on creative writing benchmarks, while maintaining robustness on question-answering datasets such as GPQA, GSM8K, and MT-Bench. Additionally, an *LLM-as-judge* evaluation confirms that top-H indeed produces coherent outputs even at higher temperatures, where creativity is especially critical. In summary, top-H advances SoTA in open-ended text generation and can be *easily integrated* into creative writing applications. The code is available at this https URL.

443

10 Nov 2017

autonomous-vehicles computer-science artificial-intelligence

CARLA: An Open Urban Driving Simulator

Intel Labs Toyota Research Institute Computer Vision Center, Barcelona

CARLA, an open-source urban driving simulator developed by Intel Labs, TRI, and CVC, provides a high-fidelity platform for autonomous driving research, demonstrating its utility by benchmarking different driving paradigms and revealing generalization challenges to novel environments.

765

12 Jul 2024

computer-science artificial-intelligence computation-and-language

Bias and Fairness in Large Language Models: A Survey

Stanford University Intel Labs

Adobe

Franck Dernoncourt

This survey provides a comprehensive systematization of research on bias and fairness in Large Language Models, formalizing definitions and categorizing existing metrics, datasets, and mitigation techniques. It critically evaluates current methods, identifies their limitations, and outlines key open problems in the field.

10 Oct 2025

computer-science artificial-intelligence distributed-parallel-and-cluster-computing

Slicing Is All You Need: Towards A Universal One-Sided Algorithm for Distributed Matrix Multiplication

Intel Corporation

Many important applications across science, data analytics, and AI workloads depend on distributed matrix multiplication. Prior work has developed a large array of algorithms suitable for different problem sizes and partitionings including 1D, 2D, 1.5D, and 2.5D algorithms. A limitation of current work is that existing algorithms are limited to a subset of partitionings. Multiple algorithm implementations are required to support the full space of possible partitionings. If no algorithm implementation is available for a particular set of partitionings, one or more operands must be redistributed, increasing communication costs. This paper presents a universal one-sided algorithm for distributed matrix multiplication that supports all combinations of partitionings and replication factors. Our algorithm uses slicing (index arithmetic) to compute the sets of overlapping tiles that must be multiplied together. This list of local matrix multiplies can then either be executed directly, or reordered and lowered to an optimized IR to maximize overlap. We implement our algorithm using a high-level C++-based PGAS programming framework that performs direct GPU-to-GPU communication using intra-node interconnects. We evaluate performance for a wide variety of partitionings and replication factors, finding that our work is competitive with PyTorch DTensor, a highly optimized distributed tensor library targeting AI models.

155

02 Mar 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding

KAIST Intel Labs AITRICS

Auto-Regressive (AR) models have recently gained prominence in image generation, often matching or even surpassing the performance of diffusion models. However, one major limitation of AR models is their sequential nature, which processes tokens one at a time, slowing down generation compared to models like GANs or diffusion-based methods that operate more efficiently. While speculative decoding has proven effective for accelerating LLMs by generating multiple tokens in a single forward, its application in visual AR models remains largely unexplored. In this work, we identify a challenge in this setting, which we term \textit{token selection ambiguity}, wherein visual AR models frequently assign uniformly low probabilities to tokens, hampering the performance of speculative decoding. To overcome this challenge, we propose a relaxed acceptance condition referred to as LANTERN that leverages the interchangeability of tokens in latent space. This relaxation restores the effectiveness of speculative decoding in visual AR models by enabling more flexible use of candidate tokens that would otherwise be prematurely rejected. Furthermore, by incorporating a total variation distance bound, we ensure that these speed gains are achieved without significantly compromising image quality or semantic coherence. Experimental results demonstrate the efficacy of our method in providing a substantial speed-up over speculative decoding. In specific, compared to a na\"ive application of the state-of-the-art speculative decoding, LANTERN increases speed-ups by

\mathbf{1.75}\times

and

\mathbf{1.82}\times

, as compared to greedy decoding and random sampling, respectively, when applied to LlamaGen, a contemporary visual AR model. The code is publicly available at this https URL

121

17 Jun 2025

attention-mechanisms computer-science machine-learning

On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention

The University of Texas at Austin Intel Labs

Yeonju Ro

Large language models (LLMs) excel at capturing global token dependencies via self-attention but face prohibitive compute and memory costs on lengthy inputs. While sub-quadratic methods (e.g., linear attention) can reduce these costs, they often degrade accuracy due to overemphasizing recent tokens. In this work, we first propose dual-state linear attention (DSLA), a novel design that maintains two specialized hidden states-one for preserving historical context and one for tracking recency-thereby mitigating the short-range bias typical of linear-attention architectures. To further balance efficiency and accuracy under dynamic workload conditions, we introduce DSLA-Serve, an online adaptive distillation framework that progressively replaces Transformer layers with DSLA layers at inference time, guided by a sensitivity-based layer ordering. DSLA-Serve uses a chained fine-tuning strategy to ensure that each newly converted DSLA layer remains consistent with previously replaced layers, preserving the overall quality. Extensive evaluations on commonsense reasoning, long-context QA, and text summarization demonstrate that DSLA-Serve yields 2.3x faster inference than Llama2-7B and 3.0x faster than the hybrid Zamba-7B, while retaining comparable performance across downstream tasks. Our ablation studies show that DSLA's dual states capture both global and local dependencies, addressing the historical-token underrepresentation seen in prior linear attentions. Codes are available at this https URL.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Coordinating Resource Allocation during Product Transitions Using a Multifollower Bilevel Programming Model

Point Transformer

Stronger-MAS: Multi-Agent Reinforcement Learning for Collaborative LLMs

Image-GS: Content-Adaptive Image Representation via 2D Gaussians

Deep Equilibrium Models

MonSter++: Unified Stereo Matching, Multi-view Stereo, and Real-time Stereo with Monodepth Priors

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop

MC-LLaVA: Multi-Concept Personalized Vision-Language Model

LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

Language-driven Semantic Segmentation

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer

OrcaLoca: An LLM Agent Framework for Software Issue Localization

Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation

CARLA: An Open Urban Driving Simulator

Bias and Fairness in Large Language Models: A Survey

Slicing Is All You Need: Towards A Universal One-Sided Algorithm for Distributed Matrix Multiplication

LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding

On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention

Events

AI for Law

Personalize Your Feed