alphaXiv

Lab1141

24 Jun 2025

computer-science computation-and-language explainable-ai

Measuring and Guiding Monosemanticity

German Research Center for Artificial Intelligence (DFKI)TU Darmstadt Hessian.AI Aleph Alpha Research CERTAIN Lab1141

Researchers from TU Darmstadt and Aleph Alpha developed the Feature Monosemanticity Score (FMS) to quantify concept disentanglement in large language models. They also introduced Guided Sparse Autoencoders (G-SAE), which achieved a near doubling of FMS scores compared to vanilla SAEs, enabling more precise concept detection and control in LLMs.

11 Jul 2025

computer-science artificial-intelligence machine-learning

Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings

TU Darmstadt TU Berlin Zuse Institute Berlin Hessian Center for AI (hessian.AI)Lab1141

While Prover-Verifier Games (PVGs) offer a promising path toward verifiability in nonlinear classification models, they have not yet been applied to complex inputs such as high-dimensional images. Conversely, Concept Bottleneck Models (CBMs) effectively translate such data into interpretable concepts but are limited by their reliance on low-capacity linear predictors. In this work, we introduce the Neural Concept Verifier (NCV), a unified framework combining PVGs with concept encodings for interpretable, nonlinear classification in high-dimensional settings. NCV achieves this by utilizing recent minimally supervised concept discovery models to extract structured concept encodings from raw inputs. A prover then selects a subset of these encodings, which a verifier -- implemented as a nonlinear predictor -- uses exclusively for decision-making. Our evaluations show that NCV outperforms CBM and pixel-based PVG classifier baselines on high-dimensional, logically complex datasets and also helps mitigate shortcut behavior. Overall, we demonstrate NCV as a promising step toward performative, verifiable AI.

02 Jun 2025

computer-science artificial-intelligence machine-learning

Fodor and Pylyshyn's Legacy -- Still No Human-like Systematic Compositionality in Neural Networks

DFKI Hessian.AI Technical University Darmstadt Lab1141

Strong meta-learning capabilities for systematic compositionality are emerging as an important skill for navigating the complex and changing tasks of today's world. However, in presenting models for robust adaptation to novel environments, it is important to refrain from making unsupported claims about the performance of meta-learning systems that ultimately do not stand up to scrutiny. While Fodor and Pylyshyn famously posited that neural networks inherently lack this capacity as they are unable to model compositional representations or structure-sensitive operations, and thus are not a viable model of the human mind, Lake and Baroni recently presented meta-learning as a pathway to compositionality. In this position paper, we critically revisit this claim and highlight limitations in the proposed meta-learning framework for compositionality. Our analysis shows that modern neural meta-learning systems can only perform such tasks, if at all, under a very narrow and restricted definition of a meta-learning setup. We therefore claim that `Fodor and Pylyshyn's legacy' persists, and to date, there is no human-like systematic compositionality learned in neural networks.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Measuring and Guiding Monosemanticity

Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings

Fodor and Pylyshyn's Legacy -- Still No Human-like Systematic Compositionality in Neural Networks

Events

AI for Law

Personalize Your Feed