alphaXiv

126

07 Jul 2022

computer-science computer-vision-security computer-vision-and-pattern-recognition

DAiSEE: Towards User Engagement Recognition in the Wild

IIT Hyderabad Microsoft India R&D Pvt. Ltd.

We introduce DAiSEE, the first multi-label video classification dataset comprising of 9068 video snippets captured from 112 users for recognizing the user affective states of boredom, confusion, engagement, and frustration in the wild. The dataset has four levels of labels namely - very low, low, high, and very high for each of the affective states, which are crowd annotated and correlated with a gold standard annotation created using a team of expert psychologists. We have also established benchmark results on this dataset using state-of-the-art video classification methods that are available today. We believe that DAiSEE will provide the research community with challenges in feature extraction, context-based inference, and development of suitable machine learning methods for related tasks, thus providing a springboard for further research. The dataset is available for download at this https URL.

80

26 Aug 2025

autonomous-vehicles computer-science computer-vision-and-pattern-recognition

DriveIndia: An Object Detection Dataset for Diverse Indian Traffic Scenes

IIT Hyderabad

We introduce DriveIndia, a large-scale object detection dataset purpose-built to capture the complexity and unpredictability of Indian traffic environments. The dataset contains 66,986 high-resolution images annotated in YOLO format across 24 traffic-relevant object categories, encompassing diverse conditions such as varied weather (fog, rain), illumination changes, heterogeneous road infrastructure, and dense, mixed traffic patterns and collected over 120+ hours and covering 3,400+ kilometers across urban, rural, and highway routes. DriveIndia offers a comprehensive benchmark for real-world autonomous driving challenges. We provide baseline results using state-of-the-art YOLO family models, with the top-performing variant achieving a mAP50 of 78.7%. Designed to support research in robust, generalizable object detection under uncertain road conditions, DriveIndia will be publicly available via the TiHAN-IIT Hyderabad dataset repository this https URL (Terrestrial Datasets -> Camera Dataset).

345

09 Nov 2018

computer-science computer-vision-security computer-vision-and-pattern-recognition

Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks

CISCO Systems IIIT Hyderabad IIT Hyderabad

Grad-CAM++, an enhanced visual explanation method for deep CNNs, refines gradient weighting to produce more complete object localization and better handling of multiple instances compared to Grad-CAM. This approach achieves a 35.5% increase in mAP for knowledge distillation on PASCAL VOC 2007 and a nearly 10% lower average confidence drop on ImageNet compared to Grad-CAM.

140

09 Oct 2024

agent-based-systems computer-science artificial-intelligence

Web Retrieval Agents for Evidence-Based Misinformation Detection

Université de Montréal

UC Berkeley McGill IIT Hyderabad Dartmouth Universit ´e de Montr ´eal

This paper develops an agent-based automated fact-checking approach for detecting misinformation. We demonstrate that combining a powerful LLM agent, which does not have access to the internet for searches, with an online web search agent yields better results than when each tool is used independently. Our approach is robust across multiple models, outperforming alternatives and increasing the macro F1 of misinformation detection by as much as 20 percent compared to LLMs without search. We also conduct extensive analyses on the sources our system leverages and their biases, decisions in the construction of the system like the search tool and the knowledge base, the type of evidence needed and its impact on the results, and other parts of the overall process. By combining strong performance with in-depth understanding, we hope to provide building blocks for future search-enabled misinformation mitigation systems.

361

08 Apr 2025

causal-inference computer-science artificial-intelligence

Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference

Microsoft IIT Hyderabad

Aniket Vashishtha

Researchers from UIUC, MIT, CISPA, IIT-Hyderabad, and Microsoft Research propose using causal order as a more stable output for imperfect experts in causal inference and introduce a "triplet method" for eliciting this order. This approach significantly reduces cycle formation and improves the accuracy of LLM-generated causal knowledge, enhancing downstream causal discovery and effect estimation tasks, even with smaller language models.

5

113

07 Aug 2025

computer-science distributed-parallel-and-cluster-computing

Simulating LLM training workloads for heterogeneous compute and network infrastructure

IIIT-Delhi IIT Hyderabad Marvell Technology, Inc.

The growing demand for large-scale GPU clusters in distributed model training presents a significant barrier to innovation, particularly in model optimization, performance tuning, and system-level enhancements. To address this challenge, LLM training simulators are employed to estimate training time and guide design decisions. However, the state-of-the-art LLM training simulators assume homogeneous compute and network infrastructure. In practice, device heterogeneity is inevitable due to resource sharing in cloud environments, frequent shifts in device generations, and inherent intra-chip interconnect heterogeneity. To address the gap between state-of-the-art and practical requirements, we propose the design of a heterogeneity-aware distributed LLM simulator capable of predicting training time while enabling abstractions to specify custom configurations for device groups and device-to-parallelism mapping. We present the design requirements and challenges in building a heterogeneity-aware distributed ML training simulator, and design components such as non-uniform workload partitioning. Our initial simulation results demonstrate the impact of heterogeneity on the model computation and communication time.

107

09 Jan 2025

attention-mechanisms computer-science artificial-intelligence

Analyzing Memorization in Large Language Models through the Lens of Model Attribution

University of Virginia IIT Hyderabad

Adobe

Large Language Models (LLMs) are prevalent in modern applications but often memorize training data, leading to privacy breaches and copyright issues. Existing research has mainly focused on posthoc analyses, such as extracting memorized content or developing memorization metrics, without exploring the underlying architectural factors that contribute to memorization. In this work, we investigate memorization from an architectural lens by analyzing how attention modules at different layers impact its memorization and generalization performance. Using attribution techniques, we systematically intervene in the LLM architecture by bypassing attention modules at specific blocks while keeping other components like layer normalization and MLP transformations intact. We provide theorems analyzing our intervention mechanism from a mathematical view, bounding the difference in layer outputs with and without our attributions. Our theoretical and empirical analyses reveal that attention modules in deeper transformer blocks are primarily responsible for memorization, whereas earlier blocks are crucial for the models generalization and reasoning capabilities. We validate our findings through comprehensive experiments on different LLM families (Pythia and GPTNeo) and five benchmark datasets. Our insights offer a practical approach to mitigate memorization in LLMs while preserving their performance, contributing to safer and more ethical deployment in real world applications.

15,165

12 Apr 2025

computer-science distributed-parallel-and-cluster-computing

BANG: Billion-Scale Approximate Nearest Neighbor Search using a Single GPU

Microsoft IIT Hyderabad LIP LabEx MILYON

The BANG method enables billion-scale Approximate Nearest Neighbour Search (ANNS) with high recall and throughput using a single GPU. It demonstrates 50x to 400x higher throughput on 1-billion point datasets compared to competing methods while maintaining high accuracy.

18

25 Sep 2025

cosmology-and-nongalactic-astrophysics physics

Dark Energy Survey Year 6 Results: improved mitigation of spatially varying observational systematics with masking

California Institute of Technology

University of Waterloo

Northeastern University

University of Chicago

UC Berkeley

University College London

University of Michigan

Argonne National Laboratory

University of Pennsylvania Universidad Autónoma de Madrid

University of Wisconsin-Madison

Lawrence Berkeley National Laboratory

Duke University Fermi National Accelerator Laboratory

Princeton University University of Queensland University of Portsmouth National Center for Supercomputing Applications Universität Hamburg University of Zürich IIT Hyderabad Universidad de La Laguna Ludwig-Maximilians-Universität Kavli Institute for Cosmological Physics The Barcelona Institute of Science and Technology Institut de Física d’Altes Energies (IFAE)Institut d’Estudis Espacials de Catalunya (IEEC)Laboratoire de Physique des 2 Infinis Irène Joliot-Curie Instituto de Astrofisica de Canarias Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas (CIEMAT)William Jewell College Instituto de Física Teórica UAM/CSIC Laboratório Interinstitucional de e-Astronomia - LIneA CNRS – Université Paris Saclay Institute of Space Sciences (ICE–CSIC)Universit Grenoble Alpes Ruhr-University-Bochum INAF ` Osservatorio Astronomico di Trieste

As photometric surveys reach unprecedented statistical precision, systematic uncertainties increasingly dominate large-scale structure probes relying on galaxy number density. Defining the final survey footprint is critical, as it excludes regions affected by artefacts or suboptimal observing conditions. For galaxy clustering, spatially varying observational systematics, such as seeing, are a leading source of bias. Template maps of contaminants are used to derive spatially dependent corrections, but extreme values may fall outside the applicability range of mitigation methods, compromising correction reliability. The complexity and accuracy of systematics modelling depend on footprint conservativeness, with aggressive masking enabling simpler, robust mitigation. We present a unified approach to define the DES Year 6 joint footprint, integrating observational systematics templates and artefact indicators that degrade mitigation performance. This removes extreme values from an initial seed footprint, leading to the final joint footprint. By evaluating the DES Year 6 lens sample MagLim++ plus plus on this footprint, we enhance the Iterative Systematics Decontamination (ISD) method, detecting non-linear systematic contamination and improving correction accuracy. While the mask's impact on clustering is less significant than systematics decontamination, it remains non-negligible, comparable to statistical uncertainties in certain w(theta) scales and redshift bins. Supporting coherent analyses of galaxy clustering and cosmic shear, the final footprint spans 4031.04 deg2, setting the basis for DES Year 6 1x2pt, 2x2pt, and 3x2pt analyses. This work highlights how targeted masking strategies optimise the balance between statistical power and systematic control in Stage-III and -IV surveys.

27

08 Oct 2024

computer-science artificial-intelligence computation-and-language

Automatic Summarization of Long Documents

University of Colorado Colorado Springs IIT Hyderabad

A vast amount of textual data is added to the internet daily, making utilization and interpretation of such data difficult and cumbersome. As a result, automatic text summarization is crucial for extracting relevant information, saving precious reading time. Although many transformer-based models excel in summarization, they are constrained by their input size, preventing them from processing texts longer than their context size. This study introduces three novel algorithms that allow any LLM to efficiently overcome its input size limitation, effectively utilizing its full potential without any architectural modifications. We test our algorithms on texts with more than 70,000 words, and our experiments show a significant increase in BERTScore with competitive ROUGE scores.

10

25

09 Jan 2018

cosmology-and-nongalactic-astrophysics astrophysics-of-galaxies physics

Stellar Streams Discovered in the Dark Energy Survey

University of Cincinnati

University of Illinois at Urbana-Champaign

University of Cambridge SLAC National Accelerator Laboratory

University of Chicago

University College London

Stanford University

University of Michigan

Texas A&M University

University of Pennsylvania

Brookhaven National Laboratory University of Surrey Fermi National Accelerator Laboratory University of Portsmouth University of Sussex National Center for Supercomputing Applications IIT Hyderabad Sorbonne Universités Rhodes University Universidad Autonoma de Madrid Institut d'Astrophysique de Paris Institut de Física d’Altes Energies (IFAE)Institut d’Estudis Espacials de Catalunya (IEEC)Observatório Nacional Ludwig Maximilians University Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas (CIEMAT)Cerro Tololo Inter-American Observatory Australian Astronomical Observatory Laboratório Interinstitucional de e-Astronomia - LIneA Exelon Corporation LSST Institute of Space Sciences (ICE–CSIC)Max Planck-Institute for Extraterrestrial Physics

Risa Wechsler

We perform a search for stellar streams around the Milky Way using the first three years of multi-band optical imaging data from the Dark Energy Survey (DES). We use DES data covering

\sim 5000

sq. deg. to a depth of

g &gt; 23.5

with a relative photometric calibration uncertainty of

&lt; 1 \%

. This data set yields unprecedented sensitivity to the stellar density field in the southern celestial hemisphere, enabling the detection of faint stellar streams to a heliocentric distance of

\sim 50

kpc. We search for stellar streams using a matched-filter in color-magnitude space derived from a synthetic isochrone of an old, metal-poor stellar population. Our detection technique recovers four previously known thin stellar streams: Phoenix, ATLAS, Tucana III, and a possible extension of Molonglo. In addition, we report the discovery of eleven new stellar streams. In general, the new streams detected by DES are fainter, more distant, and lower surface brightness than streams detected by similar techniques in previous photometric surveys. As a by-product of our stellar stream search, we find evidence for extra-tidal stellar structure associated with four globular clusters: NGC 288, NGC 1261, NGC 1851, and NGC 1904. The ever-growing sample of stellar streams will provide insight into the formation of the Galactic stellar halo, the Milky Way gravitational potential, as well as the large- and small-scale distribution of dark matter around the Milky Way.

165

24 Oct 2025

causal-inference computer-science artificial-intelligence

Teaching Transformers Causal Reasoning through Axiomatic Training

IIT Hyderabad

For text-based AI systems to interact in the real world, causal reasoning is an essential skill. Since active interventions are costly, we study to what extent a system can learn causal reasoning from symbolic demonstrations of causal axioms. Specifically, we present an axiomatic training method where the system learns from multiple demonstrations of a causal axiom (or rule), rather than incorporating the axiom as an inductive bias or inferring it from data values. A key question is whether the system would learn to generalize from the axiom demonstrations to more complex scenarios. Our results, based on applying axiomatic training to learn the transitivity axiom and d-separation rule, indicate that such generalization is possible. To avoid data contamination issues, we start with a 67 million parameter transformer model and train it from scratch. On both tasks, we find that a model trained on linear causal chains (along with some noisy variations) can generalize well to complex graphs, including longer causal chains, causal chains with reversed order, and graphs with this http URL handle diverse text inputs, the same method is extended to finetune language models. Finetuning Llama-3-8B-Instruct model on our axiomatic data leads to significant gains on causal benchmarks such as Corr2Cause and CLEAR, in some cases providing state-of-the-art performance surpassing GPT-4.

6

40

23 Oct 2021

computer-science continual-learning computer-vision-and-pattern-recognition

Multi-Domain Incremental Learning for Semantic Segmentation

IIIT Hyderabad IIT Hyderabad IIT Delhi

Recent efforts in multi-domain learning for semantic segmentation attempt to learn multiple geographical datasets in a universal, joint model. A simple fine-tuning experiment performed sequentially on three popular road scene segmentation datasets demonstrates that existing segmentation frameworks fail at incrementally learning on a series of visually disparate geographical domains. When learning a new domain, the model catastrophically forgets previously learned knowledge. In this work, we pose the problem of multi-domain incremental learning for semantic segmentation. Given a model trained on a particular geographical domain, the goal is to (i) incrementally learn a new geographical domain, (ii) while retaining performance on the old domain, (iii) given that the previous domain's dataset is not accessible. We propose a dynamic architecture that assigns universally shared, domain-invariant parameters to capture homogeneous semantic features present in all domains, while dedicated domain-specific parameters learn the statistics of each domain. Our novel optimization strategy helps achieve a good balance between retention of old knowledge (stability) and acquiring new knowledge (plasticity). We demonstrate the effectiveness of our proposed solution on domain incremental settings pertaining to real-world driving scenes from roads of Germany (Cityscapes), the United States (BDD100k), and India (IDD).

57

27

08 Aug 2021

computer-science computer-vision-and-pattern-recognition

Monte Carlo DropBlock for Modelling Uncertainty in Object Detection

NVIDIA IIT Hyderabad

With the advancements made in deep learning, computer vision problems like object detection and segmentation have seen a great improvement in performance. However, in many real-world applications such as autonomous driving vehicles, the risk associated with incorrect predictions of objects is very high. Standard deep learning models for object detection such as YOLO models are often overconfident in their predictions and do not take into account the uncertainty in predictions on out-of-distribution data. In this work, we propose an efficient and effective approach to model uncertainty in object detection and segmentation tasks using Monte-Carlo DropBlock (MC-DropBlock) based inference. The proposed approach applies drop-block during training time and test time on the convolutional layer of the deep learning models such as YOLO. We show that this leads to a Bayesian convolutional neural network capable of capturing the epistemic uncertainty in the model. Additionally, we capture the aleatoric uncertainty using a Gaussian likelihood. We demonstrate the effectiveness of the proposed approach on modeling uncertainty in object detection and segmentation tasks using out-of-distribution experiments. Experimental results show that MC-DropBlock improves the generalization, calibration, and uncertainty modeling capabilities of YOLO models in object detection and segmentation.

29

14 Dec 2021

high-energy-physics-phenomenology high-energy-physics-theory physics

The Infrared Structure of Perturbative Gauge Theories

INFN

CERN IIT Hyderabad Chaitanya Bharathi Institute of Technology Karlsruher Institut für Technologie (KIT)Universit di Torino

Infrared divergences in the perturbative expansion of gauge theory amplitudes and cross sections have been a focus of theoretical investigations for almost a century. New insights still continue to emerge, as higher perturbative orders are explored, and high-precision phenomenological applications demand an ever more refined understanding. This review aims to provide a pedagogical overview of the subject. We briefly cover some of the early historical results, we provide some simple examples of low-order applications in the context of perturbative QCD, and discuss the necessary tools to extend these results to all perturbative orders. Finally, we describe recent developments concerning the calculation of soft anomalous dimensions in multi-particle scattering amplitudes at high orders, and we provide a brief introduction to the very active field of infrared subtraction for the calculation of differential distributions at colliders.

25

07 Nov 2025

computer-science computation-and-language embedding-methods

MorphTok: Morphologically Grounded Tokenization for Indian Languages

IIT Hyderabad IIT Mandi IIT, Bombay

Tokenization is a crucial step in NLP, especially with the rise of large language models (LLMs), impacting downstream performance, computational cost, and efficiency. Existing LLMs rely on the classical Byte-pair Encoding (BPE) algorithm for subword tokenization that greedily merges frequent character bigrams, often leading to segmentation that does not align with linguistically meaningful units. To address this, we propose morphology-aware segmentation as a pre-tokenization step before applying BPE. To facilitate morphology-aware segmentation, we create a novel dataset for Hindi and Marathi, incorporating sandhi splitting to enhance the subword tokenization. Experiments on downstream tasks show that morphologically grounded tokenization improves machine translation and language modeling performance. Additionally, to handle the dependent vowels common in syllable-based writing systems used by Indic languages, we propose Constrained BPE (CBPE), an extension to the standard BPE algorithm incorporating script-specific constraints. In particular, CBPE handles dependent vowels to form a cohesive unit with other characters instead of occurring as a single unit. Our results show that CBPE achieves a 1.68\% reduction in fertility scores while maintaining comparable or improved downstream performance in machine translation and language modeling, offering a computationally efficient alternative to standard BPE. Moreover, to evaluate segmentation across different tokenization algorithms, we introduce a new human evaluation metric, \textit{EvalTok}, enabling more human-grounded assessment.

48

09 Jul 2024

computer-science cryptography-and-security machine-learning

VEXIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity

IIT Hyderabad

S. VenkataKeerthy

Binary similarity involves determining whether two binary programs exhibit similar functionality, often originating from the same source code. In this work, we propose VexIR2Vec, an approach for binary similarity using VEX-IR, an architecture-neutral Intermediate Representation (IR). We extract the embeddings from sequences of basic blocks, termed peepholes, derived by random walks on the control-flow graph. The peepholes are normalized using transformations inspired by compiler optimizations. The VEX-IR Normalization Engine mitigates, with these transformations, the architectural and compiler-induced variations in binaries while exposing semantic similarities. We then learn the vocabulary of representations at the entity level of the IR using the knowledge graph embedding techniques in an unsupervised manner. This vocabulary is used to derive function embeddings for similarity assessment using VexNet, a feed-forward Siamese network designed to position similar functions closely and separate dissimilar ones in an n-dimensional space. This approach is amenable for both diffing and searching tasks, ensuring robustness against Out-Of-Vocabulary (OOV) issues. We evaluate VexIR2Vec on a dataset comprising 2.7M functions and 15.5K binaries from 7 projects compiled across 12 compilers targeting x86 and ARM architectures. In diffing experiments, VexIR2Vec outperforms the nearest baselines by

40\%

,

18\%

,

21\%

, and

60\%

in cross-optimization, cross-compilation, cross-architecture, and obfuscation settings, respectively. In the searching experiment, VexIR2Vec achieves a mean average precision of

0.76

, outperforming the nearest baseline by

46\%

. Our framework is highly scalable and is built as a lightweight, multi-threaded, parallel library using only open-source tools. VexIR2Vec is

3.1

-

3.5 \times

faster than the closest baselines and orders-of-magnitude faster than other tools.

8

08 Oct 2025

cosmology-and-nongalactic-astrophysics astrophysics-of-galaxies physics

Robust Measurement of Stellar Streams Around the Milky Way: Correcting Spatially Variable Observational Selection Effects in Optical Imaging Surveys

University of Washington

CNRS

California Institute of Technology

University of Illinois at Urbana-Champaign

Harvard University

University of Chicago

University College London

University of Michigan

Texas A&M University

University of Wisconsin-Madison Fermi National Accelerator Laboratory Macquarie University University of Queensland University of Portsmouth

The Ohio State University

University of Groningen

Dartmouth College National Center for Supercomputing Applications Universität Hamburg Ludwig Maximilian University of Munich University of Zürich IIT Hyderabad Instituto de Fisica Teorica UAM/CSIC Universidad de La Laguna Kavli Institute for Cosmological Physics Kapteyn Astronomical Institute The Barcelona Institute of Science and Technology Universidad Autonoma de Madrid Lowell Observatory Institut de Física d’Altes Energies (IFAE)Institució Catalana de Recerca i Estudis Avançats Instituto de Astrofisica de Canarias Santa Cruz Institute for Particle Physics Australian Astronomical Optics NSF’s National Optical-Infrared Astronomy Research Laboratory Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas (CIEMAT)Hamburger Sternwarte Cerro Tololo Inter-American Observatory Laboratório Interinstitucional de e-Astronomia - LIneA George P. and Cynthia Woods Mitchell Institute for Fundamental Physics and Astronomy LPSC-IN2P3 Institute of Cosmology and Gravitation Center for Cosmology and Astro-Particle Physics Center for Astrophysical Surveys Universit Grenoble Alpes Center for Astrophysics Harvard & Smithsonian

Observations of density variations in stellar streams are a promising probe of low-mass dark matter substructure in the Milky Way. However, survey systematics such as variations in seeing and sky brightness can also induce artificial fluctuations in the observed densities of known stellar streams. These variations arise because survey conditions affect both object detection and star-galaxy misclassification rates. To mitigate these effects, we use Balrog synthetic source injections in the Dark Energy Survey (DES) Y3 data to calculate detection rate variations and classification rates as functions of survey properties. We show that these rates are nearly separable with respect to survey properties and can be estimated with sufficient statistics from the synthetic catalogs. Applying these corrections reduces the standard deviation of relative detection rates across the DES footprint by a factor of five, and our corrections significantly change the inferred linear density of the Phoenix stream when including faint objects. Additionally, for artificial streams with DES like survey properties we are able to recover density power spectra with reduced bias. We also find that uncorrected power-spectrum results for LSST-like data can be around five times more biased, highlighting the need for such corrections in future ground based surveys.

84

07 Jan 2025

adversarial-robustness computer-science computer-vision-security

Wavelet-Driven Generalizable Framework for Deepfake Face Forgery Detection

IIIT Hyderabad IIT Hyderabad

The evolution of digital image manipulation, particularly with the advancement of deep generative models, significantly challenges existing deepfake detection methods, especially when the origin of the deepfake is obscure. To tackle the increasing complexity of these forgeries, we propose \textbf{Wavelet-CLIP}, a deepfake detection framework that integrates wavelet transforms with features derived from the ViT-L/14 architecture, pre-trained in the CLIP fashion. Wavelet-CLIP utilizes Wavelet Transforms to deeply analyze both spatial and frequency features from images, thus enhancing the model's capability to detect sophisticated deepfakes. To verify the effectiveness of our approach, we conducted extensive evaluations against existing state-of-the-art methods for cross-dataset generalization and detection of unseen images generated by standard diffusion models. Our method showcases outstanding performance, achieving an average AUC of 0.749 for cross-data generalization and 0.893 for robustness against unseen deepfakes, outperforming all compared methods. The code can be reproduced from the repo: \url{this https URL}

10

8

27 Jan 2024

computer-science machine-learning embedding-methods

MMD-Regularized Unbalanced Optimal Transport

IIT Hyderabad

We study the unbalanced optimal transport (UOT) problem, where the marginal constraints are enforced using Maximum Mean Discrepancy (MMD) regularization. Our work is motivated by the observation that the literature on UOT is focused on regularization based on

\phi

-divergence (e.g., KL divergence). Despite the popularity of MMD, its role as a regularizer in the context of UOT seems less understood. We begin by deriving a specific dual of MMD-regularized UOT (MMD-UOT), which helps us prove several useful properties. One interesting outcome of this duality result is that MMD-UOT induces novel metrics, which not only lift the ground metric like the Wasserstein but are also sample-wise efficient to estimate like the MMD. Further, for real-world applications involving non-discrete measures, we present an estimator for the transport plan that is supported only on the given (

m

) samples. Under certain conditions, we prove that the estimation error with this finitely-supported transport plan is also

\mathcal{O}(1/\sqrt{m})

. As far as we know, such error bounds that are free from the curse of dimensionality are not known for

\phi

-divergence regularized UOT. Finally, we discuss how the proposed estimator can be computed efficiently using accelerated gradient descent. Our experiments show that MMD-UOT consistently outperforms popular baselines, including KL-regularized UOT and MMD, in diverse machine learning applications.

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

DAiSEE: Towards User Engagement Recognition in the Wild

DriveIndia: An Object Detection Dataset for Diverse Indian Traffic Scenes

Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks

Web Retrieval Agents for Evidence-Based Misinformation Detection

Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference

Simulating LLM training workloads for heterogeneous compute and network infrastructure

Analyzing Memorization in Large Language Models through the Lens of Model Attribution

BANG: Billion-Scale Approximate Nearest Neighbor Search using a Single GPU

Dark Energy Survey Year 6 Results: improved mitigation of spatially varying observational systematics with masking

Automatic Summarization of Long Documents

Stellar Streams Discovered in the Dark Energy Survey

Teaching Transformers Causal Reasoning through Axiomatic Training

Multi-Domain Incremental Learning for Semantic Segmentation

Monte Carlo DropBlock for Modelling Uncertainty in Object Detection

The Infrared Structure of Perturbative Gauge Theories

MorphTok: Morphologically Grounded Tokenization for Indian Languages

VEXIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity

Robust Measurement of Stellar Streams Around the Milky Way: Correcting Spatially Variable Observational Selection Effects in Optical Imaging Surveys

Wavelet-Driven Generalizable Framework for Deepfake Face Forgery Detection

MMD-Regularized Unbalanced Optimal Transport

Events

AI for Law

Personalize Your Feed