alphaXiv

History

Papers Benchmarks

IIT Kanpur

497

05 Feb 2025

computer-science artificial-intelligence computation-and-language

Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation

IIT Kanpur

Adobe IIT, Bombay

Cache-Craft introduces a chunk-caching system to improve RAG inference efficiency by optimizing the prefill phase, the primary bottleneck in RAG workloads. Developed by researchers at Adobe Research, IIT Bombay, and IIT Kanpur, the system maintains 90-95% output quality while reducing Time-to-First-Token latency by up to 2-3x and increasing throughput by up to 1.6x.

190

19 Feb 2025

computer-science computation-and-language

Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs

IIT Kanpur

Adobe IIT, Bombay

Adobe Research and IIT collaborators developed ContextualLens, a training-free method utilizing contextual token embeddings from intermediate VLM layers to enhance hallucination detection and provide precise visual grounding for complex visual-language phenomena. This approach improves performance in tasks like identifying attributes, spatial relations, and OCR, advancing from zero-shot object segmentation to grounded visual question answering.

04 Dec 2024

computer-science machine-learning deep-reinforcement-learning

Towards Size-Independent Generalization Bounds for Deep Operator Nets

University of Florida The University of Manchester IIT Kanpur

Anirbit

Pulkit Gopalani

In recent times machine learning methods have made significant advances in becoming a useful tool for analyzing physical systems. A particularly active area in this theme has been "physics-informed machine learning" which focuses on using neural nets for numerically solving differential equations. In this work, we aim to advance the theory of measuring out-of-sample error while training DeepONets - which is among the most versatile ways to solve P.D.E systems in one-shot. Firstly, for a class of DeepONets, we prove a bound on their Rademacher complexity which does not explicitly scale with the width of the nets involved. Secondly, we use this to show how the Huber loss can be chosen so that for these DeepONet classes generalization error bounds can be obtained that have no explicit dependence on the size of the nets. The effective capacity measure for DeepONets that we thus derive is also shown to correlate with the behavior of generalization error in experiments.

23 Oct 2025

computer-science machine-learning fine-tuning

Why DPO is a Misspecified Estimator and How to Fix It

IISc, Bangalore IIT Kanpur HP AI Research

Researchers at the Indian Institute of Science, IIT Kanpur, and HP AI Research diagnosed DPO as a misspecified estimator for parametric language models and introduced AuxDPO, a new direct preference optimization algorithm. This method mitigates DPO's inherent limitations by preventing preference reversals and reward degradation, demonstrating consistent performance improvements across various LLM alignment tasks and datasets.

268

11 Dec 2024

computer-science artificial-intelligence computation-and-language

NyayaAnumana & INLegalLlama: The Largest Indian Legal Judgment Prediction Dataset and Specialized Language Model for Enhanced Decision Analysis

IIT Kanpur IISER Kolkata Symbiosis Law School Pune

Researchers from IIT Kanpur, IISER Kolkata, and Symbiosis Law School developed NyayaAnumana, the largest Indian legal dataset comprising over 800,000 labeled cases, and INLegalLlama, a specialized large language model. This model achieves approximately 90% accuracy in binary legal judgment prediction and provides comprehensible explanations for its decisions, enhancing transparency and trust in AI-assisted legal processes.

116

11 Jun 2025

computer-science artificial-intelligence computation-and-language

Using Shapley interactions to understand how models use structure

Harvard University

Carnegie Mellon University IIT Kanpur SimPPL

Language is an intricately structured system, and a key goal of NLP interpretability is to provide methodological insights for understanding how language models represent this structure internally. In this paper, we use Shapley Taylor interaction indices (STII) in order to examine how language and speech models internally relate and structure their inputs. Pairwise Shapley interactions measure how much two inputs work together to influence model outputs beyond if we linearly added their independent influences, providing a view into how models encode structural interactions between inputs. We relate the interaction patterns in models to three underlying linguistic structures: syntactic structure, non-compositional semantics, and phonetic coarticulation. We find that autoregressive text models encode interactions that correlate with the syntactic proximity of inputs, and that both autoregressive and masked models encode nonlinear interactions in idiomatic phrases with non-compositional semantics. Our speech results show that inputs are more entangled for pairs where a neighboring consonant is likely to influence a vowel or approximant, showing that models encode the phonetic interaction needed for extracting discrete phonemic representations.

10 Mar 2025

active-learning adversarial-robustness computer-science

ADROIT: A Self-Supervised Framework for Learning Robust Representations for Active Learning

Amazon IIT Kanpur

Active learning aims to select optimal samples for labeling, minimizing annotation costs. This paper introduces a unified representation learning framework tailored for active learning with task awareness. It integrates diverse sources, comprising reconstruction, adversarial, self-supervised, knowledge-distillation, and classification losses into a unified VAE-based ADROIT approach. The proposed approach comprises three key components - a unified representation generator (VAE), a state discriminator, and a (proxy) task-learner or classifier. ADROIT learns a latent code using both labeled and unlabeled data, incorporating task-awareness by leveraging labeled data with the proxy classifier. Unlike previous approaches, the proxy classifier additionally employs a self-supervised loss on unlabeled data and utilizes knowledge distillation to align with the target task-learner. The state discriminator distinguishes between labeled and unlabeled data, facilitating the selection of informative unlabeled samples. The dynamic interaction between VAE and the state discriminator creates a competitive environment, with the VAE attempting to deceive the discriminator, while the state discriminator learns to differentiate between labeled and unlabeled inputs. Extensive evaluations on diverse datasets and ablation analysis affirm the effectiveness of the proposed model.

07 Sep 2022

computer-science computer-vision-and-pattern-recognition domain-adaptation

SITA: Single Image Test-time Adaptation

Google Research IISc, Bangalore IIT Kanpur

In Test-time Adaptation (TTA), given a source model, the goal is to adapt it to make better predictions for test instances from a different distribution than the source. Crucially, TTA assumes no access to the source data or even any additional labeled/unlabeled samples from the target distribution to finetune the source model. In this work, we consider TTA in a more pragmatic setting which we refer to as SITA (Single Image Test-time Adaptation). Here, when making a prediction, the model has access only to the given single test instance, rather than a batch of instances, as typically been considered in the literature. This is motivated by the realistic scenarios where inference is needed on-demand instead of delaying for an incoming batch or the inference is happening on an edge device (like mobile phone) where there is no scope for batching. The entire adaptation process in SITA should be extremely fast as it happens at inference time. To address this, we propose a novel approach AugBN that requires only a single forward pass. It can be used on any off-the-shelf trained model to test single instances for both classification and segmentation tasks. AugBN estimates normalization statistics of the unseen test distribution from the given test image using only one forward pass with label-preserving transformations. Since AugBN does not involve any back-propagation, it is significantly faster compared to recent test time adaptation methods. We further extend AugBN to make the algorithm hyperparameter-free. Rigorous experimentation show that our simple algorithm is able to achieve significant performance gains for a variety of datasets, tasks, and network architectures.

14 Nov 2023

computer-science machine-learning programming-languages

Finding Inductive Loop Invariants using Large Language Models

Microsoft IIT Kanpur Cornell

Researchers from Microsoft Research and collaborators introduce "Loopy," a toolchain that leverages large language models (LLMs) to generate inductive loop invariants for C programs. Integrating LLM-generated candidates with symbolic verification (Houdini) and an LLM-based repair mechanism, Loopy successfully verified 398 out of 469 benchmarks and solved 31 unique instances where a state-of-the-art symbolic verifier failed.

13 Nov 2025

bayesian-optimization computer-science data-structures-and-algorithms

Distribution Learning Meets Graph Structure Sampling

CNRS

National University of Singapore IIT Kanpur University of Nebraska-Lincoln

Arnab Bhattacharyya

This work establishes a novel link between the problem of PAC-learning high-dimensional graphical models and the task of (efficient) counting and sampling of graph structures, using an online learning framework. We observe that if we apply the exponentially weighted average (EWA) or randomized weighted majority (RWM) forecasters on a sequence of samples from a distribution P using the log loss function, the average regret incurred by the forecaster's predictions can be used to bound the expected KL divergence between P and the predictions. Known regret bounds for EWA and RWM then yield new sample complexity bounds for learning Bayes nets. Moreover, these algorithms can be made computationally efficient for several interesting classes of Bayes nets. Specifically, we give a new sample-optimal and polynomial time learning algorithm with respect to trees of unknown structure and the first polynomial sample and time algorithm for learning with respect to Bayes nets over a given chordal skeleton.

28 Aug 2025

agents computer-science conversational-ai

CAPE: Context-Aware Personality Evaluation Framework for Large Language Models

Kyoto University IIT Kanpur

Psychometric tests, traditionally used to assess humans, are now being applied to Large Language Models (LLMs) to evaluate their behavioral traits. However, existing studies follow a context-free approach, answering each question in isolation to avoid contextual influence. We term this the Disney World test, an artificial setting that ignores real-world applications, where conversational history shapes responses. To bridge this gap, we propose the first Context-Aware Personality Evaluation (CAPE) framework for LLMs, incorporating prior conversational interactions. To thoroughly analyze the influence of context, we introduce novel metrics to quantify the consistency of LLM responses, a fundamental trait in human behavior. Our exhaustive experiments on 7 LLMs reveal that conversational history enhances response consistency via in-context learning but also induces personality shifts, with GPT-3.5-Turbo and GPT-4-Turbo exhibiting extreme deviations. While GPT models are robust to question ordering, Gemini-1.5-Flash and Llama-8B display significant sensitivity. Moreover, GPT models response stem from their intrinsic personality traits as well as prior interactions, whereas Gemini-1.5-Flash and Llama--8B heavily depend on prior interactions. Finally, applying our framework to Role Playing Agents (RPAs) shows context-dependent personality shifts improve response consistency and better align with human judgments. Our code and datasets are publicly available at: this https URL

17 Oct 2025

adversarial-attacks adversarial-robustness computer-science

Constrained Adversarial Perturbation

MBZUAI IIT Kanpur IIIT Hyderabad IIT Delhi

Deep neural networks have achieved remarkable success in a wide range of classification tasks. However, they remain highly susceptible to adversarial examples - inputs that are subtly perturbed to induce misclassification while appearing unchanged to humans. Among various attack strategies, Universal Adversarial Perturbations (UAPs) have emerged as a powerful tool for both stress testing model robustness and facilitating scalable adversarial training. Despite their effectiveness, most existing UAP methods neglect domain specific constraints that govern feature relationships. Violating such constraints, such as debt to income ratios in credit scoring or packet flow invariants in network communication, can render adversarial examples implausible or easily detectable, thereby limiting their real world applicability. In this work, we advance universal adversarial attacks to constrained feature spaces by formulating an augmented Lagrangian based min max optimization problem that enforces multiple, potentially complex constraints of varying importance. We propose Constrained Adversarial Perturbation (CAP), an efficient algorithm that solves this problem using a gradient based alternating optimization strategy. We evaluate CAP across diverse domains including finance, IT networks, and cyber physical systems, and demonstrate that it achieves higher attack success rates while significantly reducing runtime compared to existing baselines. Our approach also generalizes seamlessly to individual adversarial perturbations, where we observe similar strong performance gains. Finally, we introduce a principled procedure for learning feature constraints directly from data, enabling broad applicability across domains with structured input spaces.

07 Apr 2025

computer-science artificial-intelligence computation-and-language

TathyaNyaya and FactLegalLlama: Advancing Factual Judgment Prediction and Explanation in the Indian Legal Context

IIT Kanpur IISER Kolkata Symbiosis Law School Pune

In the landscape of Fact-based Judgment Prediction and Explanation (FJPE), reliance on factual data is essential for developing robust and realistic AI-driven decision-making tools. This paper introduces TathyaNyaya, the largest annotated dataset for FJPE tailored to the Indian legal context, encompassing judgments from the Supreme Court of India and various High Courts. Derived from the Hindi terms "Tathya" (fact) and "Nyaya" (justice), the TathyaNyaya dataset is uniquely designed to focus on factual statements rather than complete legal texts, reflecting real-world judicial processes where factual data drives outcomes. Complementing this dataset, we present FactLegalLlama, an instruction-tuned variant of the LLaMa-3-8B Large Language Model (LLM), optimized for generating high-quality explanations in FJPE tasks. Finetuned on the factual data in TathyaNyaya, FactLegalLlama integrates predictive accuracy with coherent, contextually relevant explanations, addressing the critical need for transparency and interpretability in AI-assisted legal systems. Our methodology combines transformers for binary judgment prediction with FactLegalLlama for explanation generation, creating a robust framework for advancing FJPE in the Indian legal domain. TathyaNyaya not only surpasses existing datasets in scale and diversity but also establishes a benchmark for building explainable AI systems in legal analysis. The findings underscore the importance of factual precision and domain-specific tuning in enhancing predictive performance and interpretability, positioning TathyaNyaya and FactLegalLlama as foundational resources for AI-assisted legal decision-making.

19 Jun 2025

computer-science artificial-intelligence computation-and-language

Relic: Enhancing Reward Model Generalization for Low-Resource Indic Languages with Few-Shot Examples

University of Maryland, College Park IIT Patna IIT Kanpur

Adobe IIT, Bombay

RELIC enhances reward model generalization for low-resource Indic languages through a novel in-context learning framework. It trains a retriever using a pairwise ranking loss to select discriminative examples from auxiliary high-resource language data, yielding significant accuracy improvements over baseline methods without requiring large-scale low-resource preference data.

04 Apr 2025

ai-for-health computer-science artificial-intelligence

Structured Legal Document Generation in India: A Model-Agnostic Wrapper Approach with VidhikDastaavej

IIT Kanpur IISER Kolkata SRM Institute of Science and Technology Symbiosis Law School Pune

Automating legal document drafting can significantly enhance efficiency, reduce manual effort, and streamline legal workflows. While prior research has explored tasks such as judgment prediction and case summarization, the structured generation of private legal documents in the Indian legal domain remains largely unaddressed. To bridge this gap, we introduce VidhikDastaavej, a novel, anonymized dataset of private legal documents, and develop NyayaShilp, a fine-tuned legal document generation model specifically adapted to Indian legal texts. We propose a Model-Agnostic Wrapper (MAW), a two-step framework that first generates structured section titles and then iteratively produces content while leveraging retrieval-based mechanisms to ensure coherence and factual accuracy. We benchmark multiple open-source LLMs, including instruction-tuned and domain-adapted versions, alongside proprietary models for comparison. Our findings indicate that while direct fine-tuning on small datasets does not always yield improvements, our structured wrapper significantly enhances coherence, factual adherence, and overall document quality while mitigating hallucinations. To ensure real-world applicability, we developed a Human-in-the-Loop (HITL) Document Generation System, an interactive user interface that enables users to specify document types, refine section details, and generate structured legal drafts. This tool allows legal professionals and researchers to generate, validate, and refine AI-generated legal documents efficiently. Extensive evaluations, including expert assessments, confirm that our framework achieves high reliability in structured legal drafting. This research establishes a scalable and adaptable foundation for AI-assisted legal drafting in India, offering an effective approach to structured legal document generation.

17 Nov 2025

computer-science conversational-ai artificial-intelligence

MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications

IIT Kanpur INPT IIT Bhilai

Large Language Models (LLMs) have emerged as powerful tools for automating complex reasoning and decision-making tasks. In telecommunications, they hold the potential to transform network optimization, automate troubleshooting, enhance customer support, and ensure regulatory compliance. However, their deployment in telecom is hindered by domain-specific challenges that demand specialized adaptation. To overcome these challenges and to accelerate the adaptation of LLMs for telecom, we propose MM-Telco, a comprehensive suite of multimodal benchmarks and models tailored for the telecom domain. The benchmark introduces various tasks (both text based and image based) that address various practical real-life use cases such as network operations, network management, improving documentation quality, and retrieval of relevant text and images. Further, we perform baseline experiments with various LLMs and VLMs. The models fine-tuned on our dataset exhibit a significant boost in performance. Our experiments also help analyze the weak areas in the working of current state-of-art multimodal LLMs, thus guiding towards further development and research.

07 Jul 2024

computer-science computer-vision-security artificial-intelligence

iSign: A Benchmark for Indian Sign Language Processing

IIT Kanpur Max Planck Institute for Psycholinguistics ISLRTC Microsoft IDC India

Indian Sign Language has limited resources for developing machine learning and data-driven approaches for automated language processing. Though text/audio-based language processing techniques have shown colossal research interest and tremendous improvements in the last few years, Sign Languages still need to catch up due to the need for more resources. To bridge this gap, in this work, we propose iSign: a benchmark for Indian Sign Language (ISL) Processing. We make three primary contributions to this work. First, we release one of the largest ISL-English datasets with more than 118K video-sentence/phrase pairs. To the best of our knowledge, it is the largest sign language dataset available for ISL. Second, we propose multiple NLP-specific tasks (including SignVideo2Text, SignPose2Text, Text2Pose, Word Prediction, and Sign Semantics) and benchmark them with the baseline models for easier access to the research community. Third, we provide detailed insights into the proposed benchmarks with a few linguistic insights into the workings of ISL. We streamline the evaluation of Sign Language processing, addressing the gaps in the NLP research community for Sign Languages. We release the dataset, tasks, and models via the following website: https://exploration-lab.github.io/iSign/

11 Feb 2017

computer-science artificial-intelligence sound

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

University of Montreal IIT Kanpur SSNCE

In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time. We show that our model, which profits from combining memory-less modules, namely autoregressive multilayer perceptrons, and stateful recurrent neural networks in a hierarchical structure is able to capture underlying sources of variations in the temporal sequences over very long time spans, on three datasets of different nature. Human evaluation on the generated samples indicate that our model is preferred over competing models. We also show how each component of the model contributes to the exhibited performance.

539

09 Feb 2025

computer-science artificial-intelligence computation-and-language

LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification

IIT Kanpur IISER Kolkata Symbiosis Law School Pune

In this paper, we address the task of semantic segmentation of legal documents through rhetorical role classification, with a focus on Indian legal judgments. We introduce LegalSeg, the largest annotated dataset for this task, comprising over 7,000 documents and 1.4 million sentences, labeled with 7 rhetorical roles. To benchmark performance, we evaluate multiple state-of-the-art models, including Hierarchical BiLSTM-CRF, TransformerOverInLegalBERT (ToInLegalBERT), Graph Neural Networks (GNNs), and Role-Aware Transformers, alongside an exploratory RhetoricLLaMA, an instruction-tuned large language model. Our results demonstrate that models incorporating broader context, structural relationships, and sequential sentence information outperform those relying solely on sentence-level features. Additionally, we conducted experiments using surrounding context and predicted or actual labels of neighboring sentences to assess their impact on classification accuracy. Despite these advancements, challenges persist in distinguishing between closely related roles and addressing class imbalance. Our work underscores the potential of advanced techniques for improving legal document understanding and sets a strong foundation for future research in legal NLP.

27 Aug 2025

computer-science computation-and-language machine-learning

LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning

University of Bath IIT Kanpur

Large language models (LLMs) have shown remarkable abilities in logical reasoning, in-context learning, and code generation. However, translating natural language instructions into effective robotic control policies remains a significant challenge, especially for tasks requiring long-horizon planning and operating under sparse reward conditions. Hierarchical Reinforcement Learning (HRL) provides a natural framework to address this challenge in robotics; however, it typically suffers from non-stationarity caused by the changing behavior of the lower-level policy during training, destabilizing higher-level policy learning. We introduce LGR2, a novel HRL framework that leverages LLMs to generate language-guided reward functions for the higher-level policy. By decoupling high-level reward generation from low-level policy changes, LGR2 fundamentally mitigates the non-stationarity problem in off-policy HRL, enabling stable and efficient learning. To further enhance sample efficiency in sparse environments, we integrate goal-conditioned hindsight experience relabeling. Extensive experiments across simulated and real-world robotic navigation and manipulation tasks demonstrate LGR2 outperforms both hierarchical and non-hierarchical baselines, achieving over 55% success rates on challenging tasks and robust transfer to real robots, without additional fine-tuning.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation

Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs

Towards Size-Independent Generalization Bounds for Deep Operator Nets

Why DPO is a Misspecified Estimator and How to Fix It

NyayaAnumana & INLegalLlama: The Largest Indian Legal Judgment Prediction Dataset and Specialized Language Model for Enhanced Decision Analysis

Using Shapley interactions to understand how models use structure

ADROIT: A Self-Supervised Framework for Learning Robust Representations for Active Learning

SITA: Single Image Test-time Adaptation

Finding Inductive Loop Invariants using Large Language Models

Distribution Learning Meets Graph Structure Sampling

CAPE: Context-Aware Personality Evaluation Framework for Large Language Models

Constrained Adversarial Perturbation

TathyaNyaya and FactLegalLlama: Advancing Factual Judgment Prediction and Explanation in the Indian Legal Context

Relic: Enhancing Reward Model Generalization for Low-Resource Indic Languages with Few-Shot Examples

Structured Legal Document Generation in India: A Model-Agnostic Wrapper Approach with VidhikDastaavej

MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications

iSign: A Benchmark for Indian Sign Language Processing

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification

LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning

Events

AI for Law

Personalize Your Feed