alphaXiv

History

Papers Benchmarks

Max Planck ETH Center for Learning Systems

116

06 Jun 2024

computer-science cryptography-and-security machine-learning

DPZero: Private Fine-Tuning of Language Models without Backpropagation

ETH Zurich

University of Washington Amazon Max Planck ETH Center for Learning Systems

The widespread practice of fine-tuning large language models (LLMs) on domain-specific data faces two major challenges in memory and privacy. First, as the size of LLMs continues to grow, the memory demands of gradient-based training methods via backpropagation become prohibitively high. Second, given the tendency of LLMs to memorize training data, it is important to protect potentially sensitive information in the fine-tuning data from being regurgitated. Zeroth-order methods, which rely solely on forward passes, substantially reduce memory consumption during training. However, directly combining them with standard differentially private gradient descent suffers more as model size grows. To bridge this gap, we introduce DPZero, a novel private zeroth-order algorithm with nearly dimension-independent rates. The memory efficiency of DPZero is demonstrated in privately fine-tuning RoBERTa and OPT on several downstream tasks. Our code is available at https://github.com/Liang137/DPZero.

229

02 Jun 2021

computer-science computer-vision-security computer-vision-and-pattern-recognition

Learning an Animatable Detailed 3D Face Model from In-The-Wild Images

Max Planck Institute for Intelligent Systems Max Planck ETH Center for Learning Systems

While current monocular 3D face reconstruction methods can recover fine geometric details, they suffer several limitations. Some methods produce faces that cannot be realistically animated because they do not model how wrinkles vary with expression. Other methods are trained on high-quality face scans and do not generalize well to in-the-wild images. We present the first approach that regresses 3D face shape and animatable details that are specific to an individual but change with expression. Our model, DECA (Detailed Expression Capture and Animation), is trained to robustly produce a UV displacement map from a low-dimensional latent representation that consists of person-specific detail parameters and generic expression parameters, while a regressor is trained to predict detail, shape, albedo, expression, pose and illumination parameters from a single image. To enable this, we introduce a novel detail-consistency loss that disentangles person-specific details from expression-dependent wrinkles. This disentanglement allows us to synthesize realistic person-specific wrinkles by controlling expression parameters while keeping person-specific details unchanged. DECA is learned from in-the-wild images with no paired 3D supervision and achieves state-of-the-art shape reconstruction accuracy on two benchmarks. Qualitative results on in-the-wild data demonstrate DECA's robustness and its ability to disentangle identity- and expression-dependent details enabling animation of reconstructed faces. The model and code are publicly available at this https URL.

240

07 Oct 2021

computer-science computer-vision-and-pattern-recognition

ATISS: Autoregressive Transformers for Indoor Scene Synthesis

University of Toronto

NVIDIA Vector Institute University of Tübingen Max Planck Institute for Intelligent Systems, Tübingen Max Planck ETH Center for Learning Systems

Andreas Geiger

ATISS (Autoregressive Transformers for Indoor Scene Synthesis) generates realistic 3D indoor scenes by formulating the task as an unordered set prediction problem, leveraging transformer architectures for permutation equivariance. The method produces more realistic and diverse scenes than prior approaches and enables interactive applications such as scene completion and object suggestion.

288

10 Oct 2025

agents computer-science machine-learning

ROC-n-reroll: How verifier imperfection affects test-time scaling

Max Planck Institute for Intelligent Systems

ETH Zürich Max Planck ETH Center for Learning Systems

Test-time scaling aims to improve language model performance by leveraging additional compute during inference. Many works have empirically studied techniques such as Best-of-N (BoN) and Rejection Sampling (RS) that make use of a verifier to enable test-time scaling. However, to date there is little theoretical understanding of how verifier imperfection affects performance -- a gap we address in this work. Specifically, we prove that the instance-level accuracy of these methods is precisely characterized by the geometry of the verifier's ROC curve. Our theory has two important takeaways, confirmed by experiments with Qwen and LLama models on GSM8K and MATH500. First, RS outperforms BoN for fixed compute, while both methods converge to the same accuracy in the infinite-compute limit. Second, it is generally impossible to predict the high-compute performance of either method based on observations in the low-compute regime.

08 Oct 2020

causal-inference computer-science artificial-intelligence

Algorithmic Recourse: from Counterfactual Explanations to Interventions

Max Planck Institute for Intelligent Systems Max Planck ETH Center for Learning Systems

Researchers from MPI-IS, ETH Zürich, and Saarland University reformulate algorithmic recourse by grounding it in causal interventions rather than mere counterfactual explanations. Their "Recourse through Minimal Interventions" (MINT) framework, which utilizes Structural Causal Models, provides demonstrably more efficient and actionable recommendations, reducing intervention costs by up to 65% compared to prior CFE-based methods.

23 Oct 2024

causal-inference computer-science machine-learning

Do causal predictors generalize better to new domains?

Max Planck Institute for Intelligent Systems Tübingen AI Center Max Planck ETH Center for Learning Systems

We study how well machine learning models trained on causal features generalize across domains. We consider 16 prediction tasks on tabular datasets covering applications in health, employment, education, social benefits, and politics. Each dataset comes with multiple domains, allowing us to test how well a model trained in one domain performs in another. For each prediction task, we select features that have a causal influence on the target of prediction. Our goal is to test the hypothesis that models trained on causal features generalize better across domains. Without exception, we find that predictors using all available features, regardless of causality, have better in-domain and out-of-domain accuracy than predictors using causal features. Moreover, even the absolute drop in accuracy from one domain to the other is no better for causal predictors than for models that use all features. In addition, we show that recent causal machine learning methods for domain generalization do not perform better in our evaluation than standard predictors trained on the set of causal features. Likewise, causal discovery algorithms either fail to run or select causal variables that perform no better than our selection. Extensive robustness checks confirm that our findings are stable under variable misclassification.

02 Jul 2024

computer-science machine-learning deep-reinforcement-learning

Homomorphism Autoencoder -- Learning Group Structured Representations from Observed Transitions

Max Planck Institute for Intelligent Systems

ETH Zürich Max Planck ETH Center for Learning Systems

How can agents learn internal models that veridically represent interactions with the real world is a largely open question. As machine learning is moving towards representations containing not just observational but also interventional knowledge, we study this problem using tools from representation learning and group theory. We propose methods enabling an agent acting upon the world to learn internal representations of sensory information that are consistent with actions that modify it. We use an autoencoder equipped with a group representation acting on its latent space, trained using an equivariance-derived loss in order to enforce a suitable homomorphism property on the group representation. In contrast to existing work, our approach does not require prior knowledge of the group and does not restrict the set of actions the agent can perform. We motivate our method theoretically, and show empirically that it can learn a group representation of the actions, thereby capturing the structure of the set of transformations applied to the environment. We further show that this allows agents to predict the effect of sequences of future actions with improved accuracy.

23 Oct 2020

causal-inference computer-science artificial-intelligence

Algorithmic recourse under imperfect causal knowledge: a probabilistic approach

Max Planck Institute for Intelligent Systems

University of Cambridge Saarland University Max Planck ETH Center for Learning Systems

Recent work has discussed the limitations of counterfactual explanations to recommend actions for algorithmic recourse, and argued for the need of taking causal relationships between features into consideration. Unfortunately, in practice, the true underlying structural causal model is generally unknown. In this work, we first show that it is impossible to guarantee recourse without access to the true structural equations. To address this limitation, we propose two probabilistic approaches to select optimal actions that achieve recourse with high probability given limited causal knowledge (e.g., only the causal graph). The first captures uncertainty over structural equations under additive Gaussian noise, and uses Bayesian model averaging to estimate the counterfactual distribution. The second removes any assumptions on the structural equations by instead computing the average effect of recourse actions on individuals similar to the person who seeks recourse, leading to a novel subpopulation-based interventional notion of recourse. We then derive a gradient-based procedure for selecting optimal recourse actions, and empirically show that the proposed approaches lead to more reliable recommendations under imperfect causal knowledge than non-probabilistic baselines.

04 Nov 2022

computer-science computer-vision-security computer-vision-and-pattern-recognition

I M Avatar: Implicit Morphable Head Avatars from Videos

Max Planck Institute for Intelligent Systems

ETH Zürich urich Max Planck ETH Center for Learning Systems

Traditional 3D morphable face models (3DMMs) provide fine-grained control over expression but cannot easily capture geometric and appearance details. Neural volumetric representations approach photorealism but are hard to animate and do not generalize well to unseen expressions. To tackle this problem, we propose IMavatar (Implicit Morphable avatar), a novel method for learning implicit head avatars from monocular videos. Inspired by the fine-grained control mechanisms afforded by conventional 3DMMs, we represent the expression- and pose- related deformations via learned blendshapes and skinning fields. These attributes are pose-independent and can be used to morph the canonical geometry and texture fields given novel expression and pose parameters. We employ ray marching and iterative root-finding to locate the canonical surface intersection for each pixel. A key contribution is our novel analytical gradient formulation that enables end-to-end training of IMavatars from videos. We show quantitatively and qualitatively that our method improves geometry and covers a more complete expression space compared to state-of-the-art methods.

642

121

22 Jan 2024

computer-science robotics

Getting the Ball Rolling: Learning a Dexterous Policy for a Biomimetic Tendon-Driven Hand with Rolling Contact Joints

ETH Zurich Max Planck ETH Center for Learning Systems

Biomimetic, dexterous robotic hands have the potential to replicate much of the tasks that a human can do, and to achieve status as a general manipulation platform. Recent advances in reinforcement learning (RL) frameworks have achieved remarkable performance in quadrupedal locomotion and dexterous manipulation tasks. Combined with GPU-based highly parallelized simulations capable of simulating thousands of robots in parallel, RL-based controllers have become more scalable and approachable. However, in order to bring RL-trained policies to the real world, we require training frameworks that output policies that can work with physical actuators and sensors as well as a hardware platform that can be manufactured with accessible materials yet is robust enough to run interactive policies. This work introduces the biomimetic tendon-driven Faive Hand and its system architecture, which uses tendon-driven rolling contact joints to achieve a 3D printable, robust high-DoF hand design. We model each element of the hand and integrate it into a GPU simulation environment to train a policy with RL, and achieve zero-shot transfer of a dexterous in-hand sphere rotation skill to the physical robot hand.

22 Apr 2019

computer-science computer-vision-security computer-vision-and-pattern-recognition

Superquadrics Revisited: Learning 3D Shape Parsing beyond Cuboids

Microsoft University of T ubingen MPI for Intelligent Systems Max Planck ETH Center for Learning Systems ":

Andreas Geiger

This paper presents a learning-based method for 3D shape parsing that utilizes superquadric primitives, demonstrating enhanced expressiveness compared to cuboids. It introduces an analytical solution for the Chamfer distance, enabling more efficient and stable training for unsupervised part decomposition.

28 Mar 2024

distributed-learning electrical-engineering mathematics

Decentralized Feedback Optimization via Sensitivity Decoupling: Stability and Sub-optimality

ETH Zürich urich Max Planck ETH Center for Learning Systems ":

Online feedback optimization is a controller design paradigm for optimizing the steady-state behavior of a dynamical system. It employs an optimization algorithm as a dynamic feedback controller and utilizes real-time measurements to bypass knowing exact plant dynamics and disturbances. Different from existing centralized settings, we present a fully decentralized feedback optimization controller for networked systems to lift the communication burden and improve scalability. We approximate the overall input-output sensitivity matrix through its diagonal elements, which capture local model information. For the closed-loop behavior, we characterize the stability and bound the sub-optimality due to decentralization. We prove that the proposed decentralized controller yields solutions that correspond to the Nash equilibria of a non-cooperative game.

09 Jan 2024

computer-science machine-learning mathematics

Optimal Guarantees for Algorithmic Reproducibility and Gradient Complexity in Convex Optimization

ETH Zurich

Google Research

Yale University Max Planck ETH Center for Learning Systems

Algorithmic reproducibility measures the deviation in outputs of machine learning algorithms upon minor changes in the training process. Previous work suggests that first-order methods would need to trade-off convergence rate (gradient complexity) for better reproducibility. In this work, we challenge this perception and demonstrate that both optimal reproducibility and near-optimal convergence guarantees can be achieved for smooth convex minimization and smooth convex-concave minimax problems under various error-prone oracle settings. Particularly, given the inexact initialization oracle, our regularization-based algorithms achieve the best of both worlds - optimal reproducibility and near-optimal gradient complexity - for minimization and minimax optimization. With the inexact gradient oracle, the near-optimal guarantees also hold for minimax optimization. Additionally, with the stochastic gradient oracle, we show that stochastic gradient descent ascent is optimal in terms of both reproducibility and gradient complexity. We believe our results contribute to an enhanced understanding of the reproducibility-convergence trade-off in the context of convex optimization.

19 Oct 2022

computer-science cryptography-and-security machine-learning

Bring Your Own Algorithm for Optimal Differentially Private Stochastic Minimax Optimization

University of Washington

University of Illinois at Urbana-Champaign

ETH Zürich Amazon Max Planck ETH Center for Learning Systems

We study differentially private (DP) algorithms for smooth stochastic minimax optimization, with stochastic minimization as a byproduct. The holy grail of these settings is to guarantee the optimal trade-off between the privacy and the excess population loss, using an algorithm with a linear time-complexity in the number of training samples. We provide a general framework for solving differentially private stochastic minimax optimization (DP-SMO) problems, which enables the practitioners to bring their own base optimization algorithm and use it as a black-box to obtain the near-optimal privacy-loss trade-off. Our framework is inspired from the recently proposed Phased-ERM method [22] for nonsmooth differentially private stochastic convex optimization (DP-SCO), which exploits the stability of the empirical risk minimization (ERM) for the privacy guarantee. The flexibility of our approach enables us to sidestep the requirement that the base algorithm needs to have bounded sensitivity, and allows the use of sophisticated variance-reduced accelerated methods to achieve near-linear time-complexity. To the best of our knowledge, these are the first near-linear time algorithms with near-optimal guarantees on the population duality gap for smooth DP-SMO, when the objective is (strongly-)convex--(strongly-)concave. Additionally, based on our flexible framework, we enrich the family of near-linear time algorithms for smooth DP-SCO with the near-optimal privacy-loss trade-off.

23 Nov 2021

bayesian-deep-learning bayesian-optimization computer-science

Uncertainty estimation under model misspecification in neural network regression

ETH Zürich University of Zürich Max Planck ETH Center for Learning Systems

Although neural networks are powerful function approximators, the underlying modelling assumptions ultimately define the likelihood and thus the hypothesis class they are parameterizing. In classification, these assumptions are minimal as the commonly employed softmax is capable of representing any categorical distribution. In regression, however, restrictive assumptions on the type of continuous distribution to be realized are typically placed, like the dominant choice of training via mean-squared error and its underlying Gaussianity assumption. Recently, modelling advances allow to be agnostic to the type of continuous distribution to be modelled, granting regression the flexibility of classification models. While past studies stress the benefit of such flexible regression models in terms of performance, here we study the effect of the model choice on uncertainty estimation. We highlight that under model misspecification, aleatoric uncertainty is not properly captured, and that a Bayesian treatment of a misspecified model leads to unreliable epistemic uncertainty estimates. Overall, our study provides an overview on how modelling choices in regression may influence uncertainty estimation and thus any downstream decision making process.

16 Nov 2019

computer-science computer-vision-and-pattern-recognition domain-adaptation

PersEmoN: A Deep Network for Joint Analysis of Apparent Personality, Emotion and Their Relationship

ETH Zurich

University of Illinois at Urbana-Champaign

National University of Singapore National University of Singapore (NUS)Agency for Science Technology and Research (A*STAR)Institute for Infocomm Research Advanced Digital Sciences Center Max Planck ETH Center for Learning Systems Advanced Digital Sciences Center (ADSC)Advanced Digital Sciences Center (ADSC), University of Illinois at Urbana-Champaign

Apparent personality and emotion analysis are both central to affective computing. Existing works solve them individually. In this paper we investigate if such high-level affect traits and their relationship can be jointly learned from face images in the wild. To this end, we introduce PersEmoN, an end-to-end trainable and deep Siamese-like network. It consists of two convolutional network branches, one for emotion and the other for apparent personality. Both networks share their bottom feature extraction module and are optimized within a multi-task learning framework. Emotion and personality networks are dedicated to their own annotated dataset. Furthermore, an adversarial-like loss function is employed to promote representation coherence among heterogeneous dataset sources. Based on this, we also explore the emotion-to-apparent-personality relationship. Extensive experiments demonstrate the effectiveness of PersEmoN.

13 Oct 2022

bayesian-deep-learning computer-science machine-learning

Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations

ETH Zurich

Imperial College London Max Planck ETH Center for Learning Systems

Data augmentation is commonly applied to improve performance of deep learning by enforcing the knowledge that certain transformations on the input preserve the output. Currently, the data augmentation parameters are chosen by human effort and costly cross-validation, which makes it cumbersome to apply to new datasets. We develop a convenient gradient-based method for selecting the data augmentation without validation data during training of a deep neural network. Our approach relies on phrasing data augmentation as an invariance in the prior distribution on the functions of a neural network, which allows us to learn it using Bayesian model selection. This has been shown to work in Gaussian processes, but not yet for deep neural networks. We propose a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective, which can be optimised without human supervision or validation data. We show that our method can successfully recover invariances present in the data, and that this improves generalisation and data efficiency on image datasets.

30 Mar 2023

computer-science computer-vision-and-pattern-recognition image-generation

DINER: Depth-aware Image-based NEural Radiance fields

Max Planck Institute for Intelligent Systems

ETH Zürich Max Planck ETH Center for Learning Systems

We present Depth-aware Image-based NEural Radiance fields (DINER). Given a sparse set of RGB input views, we predict depth and feature maps to guide the reconstruction of a volumetric scene representation that allows us to render 3D objects under novel views. Specifically, we propose novel techniques to incorporate depth information into feature fusion and efficient scene sampling. In comparison to the previous state of the art, DINER achieves higher synthesis quality and can process input views with greater disparity. This allows us to capture scenes more completely without changing capturing hardware requirements and ultimately enables larger viewpoint changes during novel view synthesis. We evaluate our method by synthesizing novel views, both for human heads and for general objects, and observe significantly improved qualitative results and increased perceptual metrics compared to the previous state of the art. The code is publicly available for research purposes.

14 Sep 2023

computer-science artificial-intelligence machine-learning

Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning

Google DeepMind Max Planck ETH Center for Learning Systems

We present a novel approach to address the challenge of generalization in offline reinforcement learning (RL), where the agent learns from a fixed dataset without any additional interaction with the environment. Specifically, we aim to improve the agent's ability to generalize to out-of-distribution goals. To achieve this, we propose to learn a dynamics model and check if it is equivariant with respect to a fixed type of transformation, namely translations in the state space. We then use an entropy regularizer to increase the equivariant set and augment the dataset with the resulting transformed samples. Finally, we learn a new policy offline based on the augmented dataset, with an off-the-shelf offline RL algorithm. Our experimental results demonstrate that our approach can greatly improve the test performance of the policy on the considered environments.

06 Sep 2019

adversarial-attacks high-energy-astrophysical-phenomena instrumentation-and-methods-for-astrophysics

Convolutional neural networks: a magic bullet for gravitational-wave detection?

Max Planck Institute for Intelligent Systems

University of Cambridge University of Portsmouth Max Planck ETH Center for Learning Systems Max Planck Institute for Gravitational Physics

In the last few years, machine learning techniques, in particular convolutional neural networks, have been investigated as a method to replace or complement traditional matched filtering techniques that are used to detect the gravitational-wave signature of merging black holes. However, to date, these methods have not yet been successfully applied to the analysis of long stretches of data recorded by the Advanced LIGO and Virgo gravitational-wave observatories. In this work, we critically examine the use of convolutional neural networks as a tool to search for merging black holes. We identify the strengths and limitations of this approach, highlight some common pitfalls in translating between machine learning and gravitational-wave astronomy, and discuss the interdisciplinary challenges. In particular, we explain in detail why convolutional neural networks alone cannot be used to claim a statistically significant gravitational-wave detection. However, we demonstrate how they can still be used to rapidly flag the times of potential signals in the data for a more detailed follow-up. Our convolutional neural network architecture as well as the proposed performance metrics are better suited for this task than a standard binary classifications scheme. A detailed evaluation of our approach on Advanced LIGO data demonstrates the potential of such systems as trigger generators. Finally, we sound a note of caution by constructing adversarial examples, which showcase interesting "failure modes" of our model, where inputs with no visible resemblance to real gravitational-wave signals are identified as such by the network with high confidence.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

DPZero: Private Fine-Tuning of Language Models without Backpropagation

Learning an Animatable Detailed 3D Face Model from In-The-Wild Images

ATISS: Autoregressive Transformers for Indoor Scene Synthesis

ROC-n-reroll: How verifier imperfection affects test-time scaling

Algorithmic Recourse: from Counterfactual Explanations to Interventions

Do causal predictors generalize better to new domains?

Homomorphism Autoencoder -- Learning Group Structured Representations from Observed Transitions

Algorithmic recourse under imperfect causal knowledge: a probabilistic approach

I M Avatar: Implicit Morphable Head Avatars from Videos

Getting the Ball Rolling: Learning a Dexterous Policy for a Biomimetic Tendon-Driven Hand with Rolling Contact Joints

Superquadrics Revisited: Learning 3D Shape Parsing beyond Cuboids

Decentralized Feedback Optimization via Sensitivity Decoupling: Stability and Sub-optimality

Optimal Guarantees for Algorithmic Reproducibility and Gradient Complexity in Convex Optimization

Bring Your Own Algorithm for Optimal Differentially Private Stochastic Minimax Optimization

Uncertainty estimation under model misspecification in neural network regression

PersEmoN: A Deep Network for Joint Analysis of Apparent Personality, Emotion and Their Relationship

Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations

DINER: Depth-aware Image-based NEural Radiance fields

Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning

Convolutional neural networks: a magic bullet for gravitational-wave detection?

Events

AI for Law

Personalize Your Feed