This survey paper elucidates how diverse machine learning tasks, including generative modeling and network optimization, can be framed as the evolution of probability distributions over time. It provides a unified mathematical framework by connecting optimal transport and diffusion processes, clarifying their applications and distinct properties within advanced machine learning paradigms.
Worst-case generation plays a critical role in evaluating robustness and stress-testing systems under distribution shifts, in applications ranging from machine learning models to power grids and medical prediction systems. We develop a generative modeling framework for worst-case generation for a pre-specified risk, based on min-max optimization over continuous probability distributions, namely the Wasserstein space. Unlike traditional discrete distributionally robust optimization approaches, which often suffer from scalability issues, limited generalization, and costly worst-case inference, our framework exploits the Brenier theorem to characterize the least favorable (worst-case) distribution as the pushforward of a transport map from a continuous reference measure, enabling a continuous and expressive notion of risk-induced generation beyond classical discrete DRO formulations. Based on the min-max formulation, we propose a Gradient Descent Ascent (GDA)-type scheme that updates the decision model and the transport map in a single loop, establishing global convergence guarantees under mild regularity assumptions and possibly without convexity-concavity. We also propose to parameterize the transport map using a neural network that can be trained simultaneously with the GDA iterations by matching the transported training samples, thereby achieving a simulation-free approach. The efficiency of the proposed method as a risk-induced worst-case generator is validated by numerical experiments on synthetic and image data.
This paper introduces Provable Diffusion Posterior Sampling (PDPS), a method for Bayesian inverse problems that integrates pre-trained diffusion models as data-driven priors. The approach offers the first non-asymptotic error bounds for diffusion-based posterior score estimation and demonstrates superior performance with reliable uncertainty quantification across various imaging tasks.
We propose a new notion of the formal tangent space to the Wasserstein space P(X)\mathcal{P}(X) at a given measure. Modulo an integrability condition, we say that this tangent space is made of functions over XX which are valued in the probability measures over the tangent bundle to XX. This generalization of previous concepts of tangent spaces allows us to define appropriate notions of parallel transport, C1,α\mathcal{C}^{1,\alpha} regularity over P(X)\mathcal{P}(X) and translation of a curve over P(X)\mathcal{P}(X).
We study the following distribution clustering problem: Given a hidden partition of kk distributions into two groups, such that the distributions within each group are the same, and the two distributions associated with the two clusters are ε\varepsilon-far in total variation, the goal is to recover the partition. We establish upper and lower bounds on the sample complexity for two fundamental cases: (1) when one of the cluster's distributions is known, and (2) when both are unknown. Our upper and lower bounds characterize the sample complexity's dependence on the domain size nn, number of distributions kk, size rr of one of the clusters, and distance ε\varepsilon. In particular, we achieve tightness with respect to (n,k,r,ε)(n,k,r,\varepsilon) (up to an O(logk)O(\log k) factor) for all regimes.
Since the 1990s, considerable empirical work has been carried out to train statistical models, such as neural networks (NNs), as learned heuristics for combinatorial optimization (CO) problems. When successful, such an approach eliminates the need for experts to design heuristics per problem type. Due to their structure, many hard CO problems are amenable to treatment through reinforcement learning (RL). Indeed, we find a wealth of literature training NNs using value-based, policy gradient, or actor-critic approaches, with promising results, both in terms of empirical optimality gaps and inference runtimes. Nevertheless, there has been a paucity of theoretical work undergirding the use of RL for CO problems. To this end, we introduce a unified framework to model CO problems through Markov decision processes (MDPs) and solve them using RL techniques. We provide easy-to-test assumptions under which CO problems can be formulated as equivalent undiscounted MDPs that provide optimal solutions to the original CO problems. Moreover, we establish conditions under which value-based RL techniques converge to approximate solutions of the CO problem with a guarantee on the associated optimality gap. Our convergence analysis provides: (1) a sufficient rate of increase in batch size and projected gradient descent steps at each RL iteration; (2) the resulting optimality gap in terms of problem parameters and targeted RL accuracy; and (3) the importance of a choice of state-space embedding. Together, our analysis illuminates the success (and limitations) of the celebrated deep Q-learning algorithm in this problem context.
In this paper, we examine the convergence landscape of multi-agent learning under uncertainty. Specifically, we analyze two stochastic models of regularized learning in continuous games -- one in continuous and one in discrete time with the aim of characterizing the long-run behavior of the induced sequence of play. In stark contrast to deterministic, full-information models of learning (or models with a vanishing learning rate), we show that the resulting dynamics do not converge in general. In lieu of this, we ask instead which actions are played more often in the long run, and by how much. We show that, in strongly monotone games, the dynamics of regularized learning may wander away from equilibrium infinitely often, but they always return to its vicinity in finite time (which we estimate), and their long-run distribution is sharply concentrated around a neighborhood thereof. We quantify the degree of this concentration, and we show that these favorable properties may all break down if the underlying game is not strongly monotone -- underscoring in this way the limits of regularized learning in the presence of persistent randomness and uncertainty.
We revisit the signal denoising problem through the lens of optimal transport: the goal is to recover an unknown scalar signal distribution XPX \sim P from noisy observations Y=X+σZY = X + \sigma Z, with ZZ being standard Gaussian independent of XX and σ>0\sigma>0 a known noise level. Let QQ denote the distribution of YY. We introduce a hierarchy of denoisers T0,T1,,T:RRT_0, T_1, \ldots, T_\infty : \mathbb{R} \to \mathbb{R} that are agnostic to the signal distribution PP, depending only on higher-order score functions of QQ. Each denoiser TKT_K is progressively refined using the (2K1)(2K-1)-th order score function of QQ at noise resolution σ2K\sigma^{2K}, achieving better denoising quality measured by the Wasserstein metric W(TKQ,P)W(T_K \sharp Q, P). The limiting denoiser TT_\infty identifies the optimal transport map with TQ=PT_\infty \sharp Q = P. We provide a complete characterization of the combinatorial structure underlying this hierarchy through Bell polynomial recursions, revealing how higher-order score functions encode the optimal transport map for signal denoising. We study two estimation strategies with convergence rates for higher-order scores from i.i.d. samples drawn from QQ: (i) plug-in estimation via Gaussian kernel smoothing, and (ii) direct estimation via higher-order score matching. This hierarchy of agnostic denoisers opens new perspectives in signal denoising and empirical Bayes.
We describe the loop corrections to supercharges in supersymmetric quantum field theories using the holomorphic twist formalism. We begin by reviewing the relation between supercharge corrections and the "twice-generalized" Konishi anomaly, which corrects the semi-chiral ring. In the holomorphic twist, these corrections appear as BRST anomalies and are computed using the higher operations of an underlying LL_\infty conformal algebra. We then apply this formalism to obtain the complete one-loop corrections to the supercharge of four-dimensional Lagrangian supersymmetric gauge theories, including N=4\mathcal{N}=4 SYM, where it admits a remarkably compact expression in terms of superfields.
For a class of Rd\mathbb{R}^d-ations and Zd\mathbb{Z}^d-actions on the nn-dimensional torus Tn\mathbb{T}^n, we characterize their unique ergodicity and establish a theorem of Weyl type. This result allows us to establish an isomorphism between the Banach algebra of quasi-periodic functions with spectrum in a given Z\mathbb{Z}-module and the Banach algebra of periodic functions on a torus. This, in return, allows us to give a very simple proof of Hausdorff-Young inequalities for Besicovitch almost periodic functions. The regularity of the parent function of a quasi-periodic function is also studied.
This is a contribution to the special issue of Surveys in Differential Geometry celebrating the 75th birthday of Shing-Tung Yau. The bulk of the paper is devoted to a survey of some new geometric inequalities and estimates for the Monge-Ampere equation, obtained by the authors in the last few years in joint work with F. Tong, J. Song, and J. Sturm. These all depend in an essential way on Yau's solution of the Calabi conjecture, which is itself nearing its own 50th birthday. The opportunity is also taken to survey briefly many current directions in complex geometry, which he more recently pioneered.
In this paper, we propose objective-evaluation-free (OEF) variants of the proximal Newton method for nonconvex composite optimization problems and the regularized Newton method for unconstrained optimization problems, respectively, using inexact evaluations of gradients and Hessians. Theoretical analysis demonstrates that the global/local convergence rates of the proposed algorithms are consistent with those achieved when both objective function and derivatives are evaluated exactly. Additionally, we present an OEF regularized Newton and negative curvature algorithm that uses inexact derivatives to find approximate second-order stationary points for unconstrained optimization problems. The worst-case iteration/(sample) operation complexity of the proposed algorithm matches the optimal results reported in the literature.
As Large Language Models (LLMs) increasingly operate as autonomous decision-makers in interactive and multi-agent systems and human societies, understanding their strategic behaviour has profound implications for safety, coordination, and the design of AI-driven social and economic infrastructures. Assessing such behaviour requires methods that capture not only what LLMs output, but the underlying intentions that guide their decisions. In this work, we extend the FAIRGAME framework to systematically evaluate LLM behaviour in repeated social dilemmas through two complementary advances: a payoff-scaled Prisoners Dilemma isolating sensitivity to incentive magnitude, and an integrated multi-agent Public Goods Game with dynamic payoffs and multi-agent histories. These environments reveal consistent behavioural signatures across models and languages, including incentive-sensitive cooperation, cross-linguistic divergence and end-game alignment toward defection. To interpret these patterns, we train traditional supervised classification models on canonical repeated-game strategies and apply them to FAIRGAME trajectories, showing that LLMs exhibit systematic, model- and language-dependent behavioural intentions, with linguistic framing at times exerting effects as strong as architectural differences. Together, these findings provide a unified methodological foundation for auditing LLMs as strategic agents and reveal systematic cooperation biases with direct implications for AI governance, collective decision-making, and the design of safe multi-agent systems.
We give a bimeromorphic classification of compact Kähler manifolds of Kodaira codimension one that admit a holomorphic one form without zeros.
The optimal transport (OT) map is a geometry-driven transformation between high-dimensional probability distributions which underpins a wide range of tasks in statistics, applied probability, and machine learning. However, existing statistical theory for OT map estimation is quite restricted, hinging on Brenier's theorem (quadratic cost, absolutely continuous source) to guarantee existence and uniqueness of a deterministic OT map, on which various additional regularity assumptions are imposed to obtain quantitative error bounds. In many real-world problems these conditions fail or cannot be certified, in which case optimal transportation is possible only via stochastic maps that can split mass. To broaden the scope of map estimation theory to such settings, this work introduces a novel metric for evaluating the transportation quality of stochastic maps. Under this metric, we develop computationally efficient map estimators with near-optimal finite-sample risk bounds, subject to easy-to-verify minimal assumptions. Our analysis further accommodates common forms of adversarial sample contamination, yielding estimators with robust estimation guarantees. Empirical experiments are provided which validate our theory and demonstrate the utility of the proposed framework in settings where existing theory fails. These contributions constitute the first general-purpose theory for map estimation, compatible with a wide spectrum of real-world applications where optimal transport may be intrinsically stochastic.
Input features are conventionally represented as vectors, matrices, or third order tensors in the real field, for color image classification. Inspired by the success of quaternion data modeling for color images in image recovery and denoising tasks, we propose a novel classification method for color image classification, named as the Low-rank Support Quaternion Matrix Machine (LSQMM), in which the RGB channels are treated as pure quaternions to effectively preserve the intrinsic coupling relationships among channels via the quaternion algebra. For the purpose of promoting low-rank structures resulting from strongly correlated color channels, a quaternion nuclear norm regularization term, serving as a natural extension of the conventional matrix nuclear norm to the quaternion domain, is added to the hinge loss in our LSQMM model. An Alternating Direction Method of Multipliers (ADMM)-based iterative algorithm is designed to effectively resolve the proposed quaternion optimization model. Experimental results on multiple color image classification datasets demonstrate that our proposed classification approach exhibits advantages in classification accuracy, robustness and computational efficiency, compared to several state-of-the-art methods using support vector machines, support matrix machines, and support tensor machines.
In this paper, we examine the robustness of Nash equilibria in continuous games, under both strategic and dynamic uncertainty. Starting with the former, we introduce the notion of a robust equilibrium as those equilibria that remain invariant to small -- but otherwise arbitrary -- perturbations to the game's payoff structure, and we provide a crisp geometric characterization thereof. Subsequently, we turn to the question of dynamic robustness, and we examine which equilibria may arise as stable limit points of the dynamics of "follow the regularized leader" (FTRL) in the presence of randomness and uncertainty. Despite their very distinct origins, we establish a structural correspondence between these two notions of robustness: strategic robustness implies dynamic robustness, and, conversely, the requirement of strategic robustness cannot be relaxed if dynamic robustness is to be maintained. Finally, we examine the rate of convergence to robust equilibria as a function of the underlying regularizer, and we show that entropically regularized learning converges at a geometric rate in games with affinely constrained action spaces.
We introduce and study \emph{brachistochrone-ruled timelike surfaces} in Newtonian and relativistic spacetimes. Starting from the classical cycloidal brachistochrone in a constant gravitational field, we construct a Newtonian ``brachistochrone-ruled worldsheet'' whose rulings are time-minimizing trajectories between pairs of endpoints. We then generalize this construction to stationary Lorentzian spacetimes by exploiting the reduction of arrival-time functionals to Finsler- or Jacobi-type length functionals on a spatial manifold. In this framework, relativistic brachistochrones arise as geodesics of an associated Finsler structure, and brachistochrone-ruled timelike surfaces are timelike surfaces ruled by these time-minimizing worldlines. We work out explicit examples in Minkowski spacetime and in the Schwarzschild exterior: in the flat case, for a bounded-speed time functional, the brachistochrones are straight timelike lines and a simple family of brachistochrone-ruled surfaces turns out to be totally geodesic; in the Schwarzschild case, we show how coordinate-time minimization at fixed energy reduces to geodesics of a Jacobi metric on the spatial slice, and outline a numerical scheme for constructing brachistochrone-ruled timelike surfaces. Finally, we discuss basic geometric properties of such surfaces and identify natural Jacobi fields along the rulings.
Learned image reconstruction has become a pillar in computational imaging and inverse problems. Among the most successful approaches are learned iterative networks, which are formulated by unrolling classical iterative optimisation algorithms for solving variational problems. While the underlying algorithm is usually formulated in the functional analytic setting, learned approaches are often viewed as purely discrete. In this chapter we present a unified operator view for learned iterative networks. Specifically, we formulate a learned reconstruction operator, defining how to compute, and separately the learning problem, which defines what to compute. In this setting we present common approaches and show that many approaches are closely related in their core. We review linear as well as nonlinear inverse problems in this framework and present a short numerical study to conclude.
We exhibit transverse knot types on the standard contact 33-sphere that cannot be realized as periodic Reeb orbits of a dynamically convex contact form. In particular, such transverse knot types do not arise as closed characteristics of strictly convex energy levels on a four dimensional symplectic vector space.
There are no more papers matching your filters at the moment.