alphaXiv

Basic Research of Artificial Intelligence Laboratory

22 Oct 2025

computer-science machine-learning fine-tuning

Preconditioned Norms: A Unified Framework for Steepest Descent, Quasi-Newton and Adaptive Methods

Mohamed bin Zayed University of Artificial Intelligence Innopolis University Moscow Center for Advanced Studies Basic Research of Artificial Intelligence Laboratory

Optimization lies at the core of modern deep learning, yet existing methods often face a fundamental trade-off between adapting to problem geometry and leveraging curvature utilization. Steepest descent algorithms adapt to different geometries through norm choices but remain strictly first-order, whereas quasi-Newton and adaptive optimizers incorporate curvature information but are restricted to Frobenius geometry, limiting their applicability across diverse architectures. In this work, we propose a unified framework generalizing steepest descent, quasi-Newton methods, and adaptive methods through the novel notion of preconditioned matrix norms. This abstraction reveals that widely used optimizers such as SGD and Adam, as well as more advanced approaches like Muon and KL-Shampoo, and recent hybrids including SOAP and SPlus, all emerge as special cases of the same principle. Within this framework, we provide the first systematic treatment of affine and scale invariance in the matrix-parameterized setting, establishing necessary and sufficient conditions under generalized norms. Building on this foundation, we introduce two new methods,

\texttt{MuAdam}

and

\texttt{MuAdam-SANIA}

, which combine the spectral geometry of Muon with Adam-style preconditioning. Our experiments demonstrate that these optimizers are competitive with, and in some cases outperform, existing state-of-the-art methods. Our code is available at this https URL

09 Dec 2025

mathematics optimization-and-control

Adaptive Regularized Newton Method with Inexact Hessian

Mohamed bin Zayed University of Artificial Intelligence Innopolis University Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)Basic Research of Artificial Intelligence Laboratory Basic Research of Artificial Intelligence Laboratory (BRAIn Lab)

Newton's method is the most widespread high-order method, demanding the gradient and the Hessian of the objective function. However, one of the main disadvantages of Newtons method is its lack of global convergence and high iteration cost. Both these drawbacks are critical for modern optimization motivated primarily by current applications in machine learning. In this paper, we introduce a novel algorithm to deal with these disadvantages. Our method can be implemented with various Hessian approximations, including methods that use only the first-order information. Thus, computational costs might be drastically reduced. Also, it can be adjusted to problems' geometries via the usage of different Bregman divergences. The proposed method converges for nonconvex and convex problems globally and it has the same rates as other well-known methods that lack mentioned properties. We present experiments validating our method performs according to the theoretical bounds and shows competitive performance among other Newton-based methods.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Preconditioned Norms: A Unified Framework for Steepest Descent, Quasi-Newton and Adaptive Methods

Adaptive Regularized Newton Method with Inexact Hessian

Events

AI for Law

Personalize Your Feed