ESSEC Business School
The core of generalization theory was developed for independent observations. Some PAC and PAC-Bayes bounds are available for data that exhibit a temporal dependence. However, there are constants in these bounds that depend on properties of the data-generating process: mixing coefficients, mixing time, spectral gap... Such constants are unknown in practice. In this paper, we prove a new PAC-Bayes bound for Markov chains. This bound depends on a quantity called the pseudo-spectral gap. The main novelty is that we can provide an empirical bound on the pseudo-spectral gap when the state space is finite. Thus, we obtain the first fully empirical PAC-Bayes bound for Markov chains. This extends beyond the finite case, although this requires additional assumptions. On simulated experiments, the empirical version of the bound is essentially as tight as the non-empirical one.
Importance sampling and independent Metropolis-Hastings (IMH) are among the fundamental building blocks of Monte Carlo methods. Both require a proposal distribution that globally approximates the target distribution. The Radon-Nikodym derivative of the target distribution relative to the proposal is called the weight function. Under the assumption that the weight is unbounded but has finite moments under the proposal distribution, we study the approximation error of importance sampling and of the particle independent Metropolis-Hastings algorithm (PIMH), which includes IMH as a special case. For the chains generated by such algorithms, we show that the common random numbers coupling is maximal. Using that coupling we derive bounds on the total variation distance of a PIMH chain to its target distribution. Our results allow a formal comparison of the finite-time biases of importance sampling and IMH, and we find the latter to be have a smaller bias. We further consider bias removal techniques using couplings, and provide conditions under which the resulting unbiased estimators have finite moments. These unbiased estimators provide an alternative to self-normalized importance sampling, implementable in the same settings. We compare their asymptotic efficiency as the number of particles goes to infinity, and consider their use in robust mean estimation techniques.
We consider the constrained graph alignment problem which has applications in biological network analysis. Given two input graphs G1=(V1,E1),G2=(V2,E2)G_1=(V_1,E_1), G_2=(V_2,E_2), a pair of vertex mappings induces an {\it edge conservation} if the vertex pairs are adjacent in their respective graphs. %In general terms The goal is to provide a one-to-one mapping between the vertices of the input graphs in order to maximize edge conservation. However the allowed mappings are restricted since each vertex from V1V_1 (resp. V2V_2) is allowed to be mapped to at most m1m_1 (resp. m2m_2) specified vertices in V2V_2 (resp. V1V_1). Most of results in this paper deal with the case m2=1m_2=1 which attracted most attention in the related literature. We formulate the problem as a maximum independent set problem in a related {\em conflict graph} and investigate structural properties of this graph in terms of forbidden subgraphs. We are interested, in particular, in excluding certain wheals, fans, cliques or claws (all terms are defined in the paper), which corresponds in excluding certain cycles, paths, cliques or independent sets in the neighborhood of each vertex. Then, we investigate algorithmic consequences of some of these properties, which illustrates the potential of this approach and raises new horizons for further works. In particular this approach allows us to reinterpret a known polynomial case in terms of conflict graph and to improve known approximation and fixed-parameter tractability results through efficiently solving the maximum independent set problem in conflict graphs. Some of our new approximation results involve approximation ratios that are function of the optimal value, in particular its square root; this kind of results cannot be achieved for maximum independent set in general graphs.
Statisticians often use Monte Carlo methods to approximate probability distributions, primarily with Markov chain Monte Carlo and importance sampling. Sequential Monte Carlo samplers are a class of algorithms that combine both techniques to approximate distributions of interest and their normalizing constants. These samplers originate from particle filtering for state space models and have become general and scalable sampling techniques. This article describes sequential Monte Carlo samplers and their possible implementations, arguing that they remain under-used in statistics, despite their ability to perform sequential inference and to leverage parallel processing resources among other potential benefits.
12 Mar 2025
This paper tackles the challenge of estimating a low-rank graphon from sampled network data, employing a singular value thresholding (SVT) estimator to create a piecewise-constant graphon based on the network's adjacency matrix. Under certain assumptions about the graphon's structural properties, we establish bounds on the operator norm distance between the true graphon and its estimator, as well as on the rank of the estimated graphon. In the second part of the paper, we apply our estimator to graphon games. We derive bounds on the suboptimality of interventions in the social welfare problem in graphon games when the intervention is based on the estimated graphon. These bounds are expressed in terms of the operator norm of the difference between the true and estimated graphons. We also emphasize the computational benefits of using the low-rank estimated graphon to solve these problems.
Researchers developed a family of algorithms leveraging denoising diffusion models and Schrödinger bridges to perform efficient Bayesian and general computational sampling. These methods introduce finite-time transport between distributions, offering an accelerated alternative to traditional sampling techniques for complex, high-dimensional probability distributions.
Canonical Polyadic (CP) tensor decomposition is a fundamental technique for analyzing high-dimensional tensor data. While the Alternating Least Squares (ALS) algorithm is widely used for computing CP decomposition due to its simplicity and empirical success, its theoretical foundation, particularly regarding statistical optimality and convergence behavior, remain underdeveloped, especially in noisy, non-orthogonal, and higher-rank settings. In this work, we revisit CP tensor decomposition from a statistical perspective and provide a comprehensive theoretical analysis of ALS under a signal-plus-noise model. We establish non-asymptotic, minimax-optimal error bounds for tensors of general order, dimensions, and rank, assuming suitable initialization. To enable such initialization, we propose Tucker-based Approximation with Simultaneous Diagonalization (TASD), a robust method that improves stability and accuracy in noisy regimes. Combined with ALS, TASD yields a statistically consistent estimator. We further analyze the convergence dynamics of ALS, identifying a two-phase pattern-initial quadratic convergence followed by linear refinement. We further show that in the rank-one setting, ALS with an appropriately chosen initialization attains optimal error within just one or two iterations.
This work investigates the offline formulation of the contextual bandit problem, where the goal is to leverage past interactions collected under a behavior policy to evaluate, select, and learn new, potentially better-performing, policies. Motivated by critical applications, we move beyond point estimators. Instead, we adopt the principle of pessimism where we construct upper bounds that assess a policy's worst-case performance, enabling us to confidently select and learn improved policies. Precisely, we introduce novel, fully empirical concentration bounds for a broad class of importance weighting risk estimators. These bounds are general enough to cover most existing estimators and pave the way for the development of new ones. In particular, our pursuit of the tightest bound within this class motivates a novel estimator (LS), that logarithmically smooths large importance weights. The bound for LS is provably tighter than its competitors, and naturally results in improved policy selection and learning strategies. Extensive policy evaluation, selection, and learning experiments highlight the versatility and favorable performance of LS.
6
In this paper, we study a sequential decision-making problem faced by e-commerce carriers related to when to send out a vehicle from the central depot to serve customer requests, and in which order to provide the service, under the assumption that the time at which parcels arrive at the depot is stochastic and dynamic. The objective is to maximize the expected number of parcels that can be delivered during service hours. We propose two reinforcement learning (RL) approaches for solving this problem. These approaches rely on a look-ahead strategy in which future release dates are sampled in a Monte-Carlo fashion and a batch approach is used to approximate future routes. Both RL approaches are based on value function approximation - one combines it with a consensus function (VFA-CF) and the other one with a two-stage stochastic integer linear programming model (VFA-2S). VFA-CF and VFA-2S do not need extensive training as they are based on very few hyper-parameters and make good use of integer linear programming (ILP) and branch-and-cut-based exact methods to improve the quality of decisions. We also establish sufficient conditions for partial characterization of optimal policy and integrate them into VFA-CF/VFA-2S. In an empirical study, we conduct a competitive analysis using upper bounds with perfect information. We also show that VFA-CF and VFA-2S greatly outperform alternative approaches that: 1) do not rely on future information, or 2) are based on point estimation of future information, or 3) employ heuristics rather than exact methods, or 4) use exact evaluations of future rewards.
This paper presents Bundle Network, a learning-based algorithm inspired by the Bundle Method for convex non-smooth minimization problems. Unlike classical approaches that rely on heuristic tuning of a regularization parameter, our method automatically learns to adjust it from data. Furthermore, we replace the iterative resolution of the optimization problem that provides the search direction-traditionally computed as a convex combination of gradients at visited points-with a recurrent neural model equipped with an attention mechanism. By leveraging the unrolled graph of computation, our Bundle Network can be trained end-to-end via automatic differentiation. Experiments on Lagrangian dual relaxations of the Multi-Commodity Network Design and Generalized Assignment problems demonstrate that our approach consistently outperforms traditional methods relying on grid search for parameter tuning, while generalizing effectively across datasets.
26 Dec 2024
Convex and penalized robust methods often suffer from bias induced by large outliers, limiting their effectiveness in adversarial or heavy-tailed settings. In this study, we propose a novel approach that eliminates this bias (when possible) by leveraging a non-convex MM-estimator based on the alpha divergence. We address the problem of estimating the parameters vector in high dimensional linear regression, even when a subset of the data has been deliberately corrupted by an adversary with full knowledge of the dataset and its underlying distribution. Our primary contribution is to demonstrate that the objective function, although non-convex, exhibits convexity within a carefully chosen basin of attraction, enabling robust and unbiased estimation. Additionally, we establish three key theoretical guarantees for the estimator: (a) a deviation bound that is minimax optimal up to a logarithmic factor, (b) an improved unbiased bound when the outliers are large and (c) asymptotic normality as the sample size increases. Finally, we validate the theoretical findings through empirical comparisons with state-of-the-art estimators on both synthetic and real-world datasets, highlighting the proposed method's superior robustness, efficiency, and ability to mitigate outlier-induced bias.
This study investigates the extent to which local public equity indices can statistically hedge real purchasing power loss during compounded structural macro-financial collapses in emerging markets. We employ a non-linear multiplicative real return calculations consistent with Fisher-parity logics for both domestic and foreign investors with a principled quantile regression, tail dependence copula analysis, and Shapley Additive Explanations (SHAP) to assess the explanatory power of macro variables. The analysis focuses on three recent and data-accessible exemplary collapse episodes: Turkey (2018), Nigeria (2020), and Pakistan (2021). Such cases, selected to align with post-2018 improvements in data standardization and crisis comparability, span varied monetary regimes and crisis triggers. Our tail-focused modeling reveals a systematic breakdown in public-equity-based purchasing power protection precisely during simultaneous macroeconomic and monetary dislocations when such protection is most needed. The findings call into question conventional inflation and devaluation hedge presumptions in equity pricing theory, emphasizing the limitations of equity-based protection and the need for context-sensitive strategies during compounded macro-financial distress.
Graph Convolutional Networks (GCNs) have become a pivotal method in machine learning for modeling functions over graphs. Despite their widespread success across various applications, their statistical properties (e.g., consistency, convergence rates) remain ill-characterized. To begin addressing this knowledge gap, we consider networks for which the graph structure implies that neighboring nodes exhibit similar signals and provide statistical theory for the impact of convolution operators. Focusing on estimators based solely on neighborhood aggregation, we examine how two common convolutions - the original GCN and GraphSAGE convolutions - affect the learning error as a function of the neighborhood topology and the number of convolutional layers. We explicitly characterize the bias-variance type trade-off incurred by GCNs as a function of the neighborhood size and identify specific graph topologies where convolution operators are less effective. Our theoretical findings are corroborated by synthetic experiments, and provide a start to a deeper quantitative understanding of convolutional effects in GCNs for offering rigorous guidelines for practitioners.
Machine learning models are widely recognized for their strong performance in forecasting. To keep that performance in streaming data settings, they have to be monitored and frequently re-trained. This can be done with machine learning operations (MLOps) techniques under supervision of an MLOps engineer. However, in digital platform settings where the number of data streams is typically large and unstable, standard monitoring becomes either suboptimal or too labor intensive for the MLOps engineer. As a consequence, companies often fall back on very simple worse performing ML models without monitoring. We solve this problem by adopting a design science approach and introducing a new monitoring framework, the Machine Learning Monitoring Agent (MLMA), that is designed to work at scale for any ML model with reasonable labor cost. A key feature of our framework concerns test-based automated re-training based on a data-adaptive reference loss batch. The MLOps engineer is kept in the loop via key metrics and also acts, pro-actively or retrospectively, to maintain performance of the ML model in the production stage. We conduct a large-scale test at a last-mile delivery platform to empirically validate our monitoring framework.
The convergence rate of a Markov chain to its stationary distribution is typically assessed using the concept of total variation mixing time. However, this worst-case measure often yields pessimistic estimates and is challenging to infer from observations. In this paper, we advocate for the use of the average-mixing time as a more optimistic and demonstrably easier-to-estimate alternative. We further illustrate its applicability across a range of settings, from two-point to countable spaces, and discuss some practical implications.
This study explores the potential of using public transportation systems for freight delivery, where we intend to utilize the spare capacities of public vehicles like buses, trams, metros, and trains, particularly during off-peak hours, to transport packages within the city instead of using dedicated delivery vehicles. The study contributes {to the growing} literature on innovative strategies for performing sustainable last mile deliveries. We study an operational level problem called the Three-Tier Delivery Problem on Public Transportation, where packages are first transported from the Consolidation and Distribution Center (CDC) to nearby public vehicle stations by delivery trucks. From there, public vehicles transport them into the city area. The last leg of the delivery is performed to deliver the packages to their respective customers using green vehicles or eco-friendly systems. We propose mixed-integer linear programming formulations to study the transport of packages from the CDC to the customers, use decomposition approaches to solve them, and provide numerical experiments to demonstrate the efficiency and effectiveness of the system. Our results show that this system has the potential to drastically reduce the length of trips performed by dedicated delivery vehicles, thereby reducing the negative social and environmental impacts of existing last mile delivery systems.
This document presents methods to remove the initialization or burn-in bias from Markov chain Monte Carlo (MCMC) estimates, with consequences on parallel computing, convergence diagnostics and performance assessment. The document is written as an introduction to these methods for MCMC users. Some theoretical results are mentioned, but the focus is on the methodology.
Aircraft conflict resolution is one of the major tasks of computer-aided air traffic management and represents a challenging optimization problem. Many models and methods have been proposed to assist trajectory regulation to avoid conflicts. However, the question of testing the different mathematical optimization approaches against each other is still open. Standard benchmarks include unrealistic scenarios in which all the flights move toward a common point or completely random generated instances. There is a lack of a common set of test instances that allows comparison of the available methods under a variety of heterogeneous and representative scenarios. We present a flight deconfliction benchmark generator that allows the user to choose between (i) different predefined scenario inspired by existing benchmarks in the literature; (ii) pseudo-random traffic meeting certain congestion measurements; (iii) and randomly generated traffic. The proposed setting can account for different levels of difficulty in the deconfliction of the aircraft and allows to explore and compare the real limitations of optimization approaches for aircraft conflict resolution.
Same-day deliveries (SDD) have become a new standard to satisfy the "instant gratification" of online customers. Despite the existing powerful technologies deployed in last-mile delivery, SDD services face new decision-making challenges related to the trade-off between delivery cost and time. In addition, new challenges related to environmental issues, customer satisfaction, or fairness arise. Researchers have explored various approaches to face these challenges in the context of SDD, where stochastic and dynamic data uncertainty plays a fundamental role. In this paper, we carefully review the emerging routing problems and solutions proposed in the existing literature for SDD services. We survey papers related to how to deal with dynamic arrival times of orders, how to allocate time slots to deliveries, how to select the right delivery options, how to design pickup and delivery routes, or how to partition the delivery areas and decide the composition of the fleet. We also formulate and compare models for representative problems elaborating on the pros and cons that might guide practitioners in choosing the most appropriate objectives and constraints. Finally, we sketch challenges and identify future research directions.
Platform businesses operate on a digital core and their decision making requires high-dimensional accurate forecast streams at different levels of cross-sectional (e.g., geographical regions) and temporal aggregation (e.g., minutes to days). It also necessitates coherent forecasts across all levels of the hierarchy to ensure aligned decision making across different planning units such as pricing, product, controlling and strategy. Given that platform data streams feature complex characteristics and interdependencies, we introduce a non-linear hierarchical forecast reconciliation method that produces cross-temporal reconciled forecasts in a direct and automated way through the use of popular machine learning methods. The method is sufficiently fast to allow forecast-based high-frequency decision making that platforms require. We empirically test our framework on unique, large-scale streaming datasets from a leading on-demand delivery platform in Europe and a bicycle sharing system in New York City.
There are no more papers matching your filters at the moment.