ENSAE Paris
Best Arm Identification (BAI) algorithms are deployed in data-sensitive applications, such as adaptive clinical trials or user studies. Driven by the privacy concerns of these applications, we study the problem of fixed-confidence BAI under global Differential Privacy (DP) for Bernoulli distributions. While numerous asymptotically optimal BAI algorithms exist in the non-private setting, a significant gap remains between the best lower and upper bounds in the global DP setting. This work reduces this gap to a small multiplicative constant, for any privacy budget ϵ\epsilon. First, we provide a tighter lower bound on the expected sample complexity of any δ\delta-correct and ϵ\epsilon-global DP strategy. Our lower bound replaces the Kullback-Leibler (KL) divergence in the transportation cost used by the non-private characteristic time with a new information-theoretic quantity that optimally trades off between the KL divergence and the Total Variation distance scaled by ϵ\epsilon. Second, we introduce a stopping rule based on these transportation costs and a private estimator of the means computed using an arm-dependent geometric batching. En route to proving the correctness of our stopping rule, we derive concentration results of independent interest for the Laplace distribution and for the sum of Bernoulli and Laplace distributions. Third, we propose a Top Two sampling rule based on these transportation costs. For any budget ϵ\epsilon, we show an asymptotic upper bound on its expected sample complexity that matches our lower bound to a multiplicative constant smaller than 88. Our algorithm outperforms existing δ\delta-correct and ϵ\epsilon-global DP BAI algorithms for different values of ϵ\epsilon.
As sequential learning algorithms are increasingly applied to real life, ensuring data privacy while maintaining their utilities emerges as a timely question. In this context, regret minimisation in stochastic bandits under ϵ\epsilon-global Differential Privacy (DP) has been widely studied. Unlike bandits without DP, there is a significant gap between the best-known regret lower and upper bound in this setting, though they "match" in order. Thus, we revisit the regret lower and upper bounds of ϵ\epsilon-global DP algorithms for Bernoulli bandits and improve both. First, we prove a tighter regret lower bound involving a novel information-theoretic quantity characterising the hardness of ϵ\epsilon-global DP in stochastic bandits. Our lower bound strictly improves on the existing ones across all ϵ\epsilon values. Then, we choose two asymptotically optimal bandit algorithms, i.e. DP-KLUCB and DP-IMED, and propose their DP versions using a unified blueprint, i.e., (a) running in arm-dependent phases, and (b) adding Laplace noise to achieve privacy. For Bernoulli bandits, we analyse the regrets of these algorithms and show that their regrets asymptotically match our lower bound up to a constant arbitrary close to 1. This refutes the conjecture that forgetting past rewards is necessary to design optimal bandit algorithms under global DP. At the core of our algorithms lies a new concentration inequality for sums of Bernoulli variables under Laplace mechanism, which is a new DP version of the Chernoff bound. This result is universally useful as the DP literature commonly treats the concentrations of Laplace noise and random variables separately, while we couple them to yield a tighter bound.
We study reinforcement learning (RL) problems in which agents observe the reward or transition realizations at their current state before deciding which action to take. Such observations are available in many applications, including transactions, navigation and more. When the environment is known, previous work shows that this lookahead information can drastically increase the collected reward. However, outside of specific applications, existing approaches for interacting with unknown environments are not well-adapted to these observations. In this work, we close this gap and design provably-efficient learning algorithms able to incorporate lookahead information. To achieve this, we perform planning using the empirical distribution of the reward and transition observations, in contrast to vanilla approaches that only rely on estimated expectations. We prove that our algorithms achieve tight regret versus a baseline that also has access to lookahead information - linearly increasing the amount of collected reward compared to agents that cannot handle lookahead information.
In reinforcement learning (RL), agents sequentially interact with changing environments while aiming to maximize the obtained rewards. Usually, rewards are observed only after acting, and so the goal is to maximize the expected cumulative reward. Yet, in many practical settings, reward information is observed in advance -- prices are observed before performing transactions; nearby traffic information is partially known; and goals are oftentimes given to agents prior to the interaction. In this work, we aim to quantifiably analyze the value of such future reward information through the lens of competitive analysis. In particular, we measure the ratio between the value of standard RL agents and that of agents with partial future-reward lookahead. We characterize the worst-case reward distribution and derive exact ratios for the worst-case reward expectations. Surprisingly, the resulting ratios relate to known quantities in offline RL and reward-free exploration. We further provide tight bounds for the ratio given the worst-case dynamics. Our results cover the full spectrum between observing the immediate rewards before acting to observing all the rewards before the interaction starts.
In this paper we propose a general methodology to derive regret bounds for randomized multi-armed bandit algorithms. It consists in checking a set of sufficient conditions on the sampling probability of each arm and on the family of distributions to prove a logarithmic regret. As a direct application we revisit two famous bandit algorithms, Minimum Empirical Divergence (MED) and Thompson Sampling (TS), under various models for the distributions including single parameter exponential families, Gaussian distributions, bounded distributions, or distributions satisfying some conditions on their moments. In particular, we prove that MED is asymptotically optimal for all these models, but also provide a simple regret analysis of some TS algorithms for which the optimality is already known. We then further illustrate the interest of our approach, by analyzing a new Non-Parametric TS algorithm (h-NPTS), adapted to some families of unbounded reward distributions with a bounded h-moment. This model can for instance capture some non-parametric families of distributions whose variance is upper bounded by a known constant.
We study matching markets with ties, where workers on one side of the market may have tied preferences over jobs, determined by their matching utilities. Unlike classical two-sided markets with strict preferences, no single stable matching exists that is utility-maximizing for all workers. To address this challenge, we introduce the \emph{Optimal Stable Share} (OSS)-ratio, which measures the ratio of a worker's maximum achievable utility in any stable matching to their utility in a given matching. We prove that distributions over only stable matchings can incur linear utility losses, i.e., an Ω(N)\Omega (N) OSS-ratio, where NN is the number of workers. To overcome this, we design an algorithm that efficiently computes a distribution over (possibly non-stable) matchings, achieving an asymptotically tight O(logN)O (\log N) OSS-ratio. When exact utilities are unknown, our second algorithm guarantees workers a logarithmic approximation of their optimal utility under bounded instability. Finally, we extend our offline approximation results to a bandit learning setting where utilities are only observed for matched pairs. In this setting, we consider worker-optimal stable regret, design an adaptive algorithm that smoothly interpolates between markets with strict preferences and those with statistical ties, and establish a lower bound revealing the fundamental trade-off between strict and tied preference regimes.
The existing publications demonstrate that the limit order book data is useful in predicting short-term volatility in stock markets. Since stocks are not independent, changes on one stock can also impact other related stocks. In this paper, we are interested in forecasting short-term realized volatility in a multivariate approach based on limit order book data and relational data. To achieve this goal, we introduce Graph Transformer Network for Volatility Forecasting. The model allows to combine limit order book features and an unlimited number of temporal and cross-sectional relations from different sources. Through experiments based on about 500 stocks from S&P 500 index, we find a better performance for our model than for other benchmarks.
Predicting stock prices from textual information is a challenging task due to the uncertainty of the market and the difficulty understanding the natural language from a machine's perspective. Previous researches focus mostly on sentiment extraction based on single news. However, the stocks on the financial market can be highly correlated, one news regarding one stock can quickly impact the prices of other stocks. To take this effect into account, we propose a new stock movement prediction framework: Multi-Graph Recurrent Network for Stock Forecasting (MGRN). This architecture allows to combine the textual sentiment from financial news and multiple relational information extracted from other financial data. Through an accuracy test and a trading simulation on the stocks in the STOXX Europe 600 index, we demonstrate a better performance from our model than other benchmarks.
We propose methods to estimate the individual β\beta-mixing coefficients of a real-valued geometrically ergodic Markov process from a single sample-path X0,X1,,XnX_0,X_1, \dots,X_n. Under standard smoothness conditions on the densities, namely, that the joint density of the pair (X0,Xm)(X_0,X_m) for each mm lies in a Besov space B1,s(R2)B^s_{1,\infty}(\mathbb R^2) for some known s>0s>0, we obtain a rate of convergence of order O(log(n)n[s]/(2[s]+2))\mathcal{O}(\log(n) n^{-[s]/(2[s]+2)}) for the expected error of our estimator in this case\footnote{We use [s][s] to denote the integer part of the decomposition s=[s]+{s}s=[s]+\{s\} of s(0,)s \in (0,\infty) into an integer term and a {\em strictly positive} remainder term {s}(0,1]\{s\} \in (0,1].}. We complement this result with a high-probability bound on the estimation error, and further obtain analogues of these bounds in the case where the state-space is finite. Naturally no density assumptions are required in this setting; the expected error rate is shown to be of order O(log(n)n1/2)\mathcal O(\log(n) n^{-1/2}).
Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn ϵ\epsilon-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound O~(H(AX+BY)/ϵ2)\widetilde{\mathcal{O}}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2) on the required number of realizations to learn these strategies with high probability, where HH is the length of the game, AXA_{\mathcal{X}} and BYB_{\mathcal{Y}} are the total number of actions for the two players. We also propose two Follow the Regularized leader (FTRL) algorithms for this setting: Balanced FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive FTRL which needs O~(H2(AX+BY)/ϵ2)\widetilde{\mathcal{O}}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2) realizations without this requirement by progressively adapting the regularization to the observations.
A finite range interacting particle system on a transitive graph is considered. Assuming that the dynamics and the initial measure are invariant, the normalized empirical distribution process converges in distribution to a centered diffusion process. As an application, a central limit theorem for certain hitting times, interpreted as failure times of a coherent system in reliability, is derived.
We propose methods to estimate the individual β\beta-mixing coefficients of a real-valued geometrically ergodic Markov process from a single sample-path X0,X1,,XnX_0,X_1, \dots,X_n. Under standard smoothness conditions on the densities, namely, that the joint density of the pair (X0,Xm)(X_0,X_m) for each mm lies in a Besov space B1,s(R2)B^s_{1,\infty}(\mathbb R^2) for some known s>0s>0, we obtain a rate of convergence of order O(log(n)n[s]/(2[s]+2))\mathcal{O}(\log(n) n^{-[s]/(2[s]+2)}) for the expected error of our estimator in this case\footnote{We use [s][s] to denote the integer part of the decomposition s=[s]+{s}s=[s]+\{s\} of s(0,)s \in (0,\infty) into an integer term and a {\em strictly positive} remainder term {s}(0,1]\{s\} \in (0,1].}. We complement this result with a high-probability bound on the estimation error, and further obtain analogues of these bounds in the case where the state-space is finite. Naturally no density assumptions are required in this setting; the expected error rate is shown to be of order O(log(n)n1/2)\mathcal O(\log(n) n^{-1/2}).
Due mostly to its application to cognitive radio networks, multiplayer bandits gained a lot of interest in the last decade. A considerable progress has been made on its theoretical aspect. However, the current algorithms are far from applicable and many obstacles remain between these theoretical results and a possible implementation of multiplayer bandits algorithms in real cognitive radio networks. This survey contextualizes and organizes the rich multiplayer bandits literature. In light of the existing works, some clear directions for future research appear. We believe that a further study of these different directions might lead to theoretical algorithms adapted to real-world situations.
14 Apr 2016
Due to challenging applications such as collaborative filtering, the matrix completion problem has been widely studied in the past few years. Different approaches rely on different structure assumptions on the matrix in hand. Here, we focus on the completion of a (possibly) low-rank matrix with binary entries, the so-called 1-bit matrix completion problem. Our approach relies on tools from machine learning theory: empirical risk minimization and its convex relaxations. We propose an algorithm to compute a variational approximation of the pseudo-posterior. Thanks to the convex relaxation, the corresponding minimization problem is bi-convex, and thus the method behaves well in practice. We also study the performance of this variational approximation through PAC-Bayesian learning bounds. On the contrary to previous works that focused on upper bounds on the estimation error of M with various matrix norms, we are able to derive from this analysis a PAC bound on the prediction error of our algorithm. We focus essentially on convex relaxation through the hinge loss, for which we present the complete analysis, a complete simulation study and a test on the MovieLens data set. However, we also discuss a variational approximation to deal with the logistic loss.
We extend nonparametric regression smoothing splines to a context where there is endogeneity and instrumental variables are available. Unlike popular existing estimators, the resulting estimator is one-step and relies on a unique regularization parameter. We derive rates of the convergence for the estimator and its first derivative, which are uniform in the support of the endogenous variable. We also address the issue of imposing monotonicity in estimation and extend the approach to a partly linear model. Simulations confirm the good performances of our estimator compared to two-step procedures. Our method yields economically sensible results when used to estimate Engel curves.
We consider the problem of regression learning for deterministic design and independent random errors. We start by proving a sharp PAC-Bayesian type bound for the exponentially weighted aggregate (EWA) under the expected squared empirical loss. For a broad class of noise distributions the presented bound is valid whenever the temperature parameter β\beta of the EWA is larger than or equal to 4σ24\sigma^2, where σ2\sigma^2 is the noise variance. A remarkable feature of this result is that it is valid even for unbounded regression functions and the choice of the temperature parameter depends exclusively on the noise level. Next, we apply this general bound to the problem of aggregating the elements of a finite-dimensional linear space spanned by a dictionary of functions ϕ1,...,ϕM\phi_1,...,\phi_M. We allow MM to be much larger than the sample size nn but we assume that the true regression function can be well approximated by a sparse linear combination of functions ϕj\phi_j. Under this sparsity scenario, we propose an EWA with a heavy tailed prior and we show that it satisfies a sparsity oracle inequality with leading constant one. Finally, we propose several Langevin Monte-Carlo algorithms to approximately compute such an EWA when the number MM of aggregated functions can be large. We discuss in some detail the convergence of these algorithms and present numerical experiments that confirm our theoretical findings.
02 Feb 2022
Analysing statistical properties of neural networks is a central topic in statistics and machine learning. However, most results in the literature focus on the properties of the neural network minimizing the training error. The goal of this paper is to consider aggregated neural networks using a Gaussian prior. The departure point of our approach is an arbitrary aggregate satisfying the PAC-Bayesian inequality. The main contribution is a precise nonasymptotic assessment of the estimation error appearing in the PAC-Bayes bound. We also review available bounds on the error of approximating a function by a neural network. Combining bounds on estimation and approximation errors, we establish risk bounds that are sharp enough to lead to minimax rates of estimation over Sobolev smoothness classes.
Monte Carlo methods -- such as Markov chain Monte Carlo (MCMC) and piecewise deterministic Markov process (PDMP) samplers -- provide asymptotically exact estimators of expectations under a target distribution. There is growing interest in alternatives to this asymptotic regime, in particular in constructing estimators that are exact in the limit of an infinite amount of computing processors, rather than in the limit of an infinite number of Markov iterations. In particular, Jacob et al. (2020) introduced coupled MCMC estimators to remove the non-asymptotic bias, resulting in MCMC estimators that can be embarrassingly parallelised. In this work, we extend the estimators of Jacob et al. (2020) to the continuous-time context and derive couplings for the bouncy, the boomerang and the coordinate samplers. Some preliminary empirical results are included that demonstrate the reasonable scaling of our method with the dimension of the target.
The inherent ill-posed nature of image reconstruction problems, due to limitations in the physical acquisition process, is typically addressed by introducing a regularisation term that incorporates prior knowledge about the underlying image. The iterative framework of Plug-and-Play methods, specifically designed for tackling such inverse problems, achieves state-of-the-art performance by replacing the regularisation with a generic denoiser, which may be parametrised by a neural network architecture. However, these deep learning approaches suffer from a critical limitation: the absence of a control parameter to modulate the regularisation strength, which complicates the design of a convergent regularisation. To address this issue, this work introduces a novel scaling method that explicitly integrates and adjusts the strength of regularisation. The scaling parameter enhances interpretability by reflecting the quality of the denoiser's learning process, and also systematically improves its optimisation. Furthermore, the proposed approach ensures that the resulting family of regularisations is provably stable and convergent.
We develop a continuous-time stochastic model for optimal cybersecurity investment under the threat of cyberattacks. The arrival of attacks is modeled using a Hawkes process, capturing the empirically relevant feature of clustering in cyberattacks. Extending the Gordon-Loeb model, each attack may result in a breach, with breach probability depending on the system's vulnerability. We aim at determining the optimal cybersecurity investment to reduce vulnerability. The problem is cast as a two-dimensional Markovian stochastic optimal control problem and solved using dynamic programming methods. Numerical results illustrate how accounting for attack clustering leads to more responsive and effective investment policies, offering significant improvements over static and Poisson-based benchmark strategies. Our findings underscore the value of incorporating realistic threat dynamics into cybersecurity risk management.
There are no more papers matching your filters at the moment.