Ask or search anything...

Events

AI for Law01/09 · Joel Niklaus · Hugging Face

LIPN (Sorbonne Paris Nord)

Unveiling Decision-Making in LLMs for Text Classification : Extraction of influential and interpretable concepts with Sparse Autoencoders

30 Jun 2025

LIPN (Sorbonne Paris Nord)LIX (École Polytechnique, IP Paris, CNRS)

ClassifSAE, a novel supervised Sparse Autoencoder, extracts influential and interpretable concepts from fine-tuned Large Language Models for text classification. The method consistently outperforms existing baselines in causality and interpretability, achieving up to a -30.90% reduction in accuracy upon concept ablation for Pythia-1B on AG News and higher `ConceptSim` scores ranging from 0.1309 to 0.1377.

View blog

#causal-inference #computer-science #computation-and-language

Resources

Predicting memorization within Large Language Models fine-tuned for classification

15 Jul 2025

Crédit Agricole SA LIPN (Sorbonne Paris Nord)

Large Language Models have received significant attention due to their abilities to solve a wide range of complex tasks. However these models memorize a significant proportion of their training data, posing a serious threat when disclosed at inference time. To mitigate this unintended memorization, it is crucial to understand what elements are memorized and why. This area of research is largely unexplored, with most existing works providing a posteriori explanations. To address this gap, we propose a new approach to detect memorized samples a priori in LLMs fine-tuned for classification tasks. This method is effective from the early stages of training and readily adaptable to other classification settings, such as training vision models from scratch. Our method is supported by new theoretical results, and requires a low computational budget. We achieve strong empirical results, paving the way for the systematic identification and protection of vulnerable samples before they are memorized.

View blog

#computer-science #cryptography-and-security #information-extraction

Resources

There are no more papers matching your filters at the moment.

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Dark mode

Ask or search anything...

Events