This paper presents a comprehensive overview and taxonomy of time-series anomaly detection methods from the past decade.
In this work, we investigate high-dimensional kernel ridge regression (KRR) on i.i.d. Gaussian data with anisotropic power-law covariance. This setting differs fundamentally from the classical source & capacity conditions for KRR, where power-law assumptions are typically imposed on the kernel eigen-spectrum itself. Our contributions are twofold. First, we derive an explicit characterization of the kernel spectrum for polynomial inner-product kernels, giving a precise description of how the kernel eigen-spectrum inherits the data decay. Second, we provide an asymptotic analysis of the excess risk in the high-dimensional regime for a particular kernel with this spectral behavior, showing that the sample complexity is governed by the effective dimension of the data rather than the ambient dimension. These results establish a fundamental advantage of learning with power-law anisotropic data over isotropic data. To our knowledge, this is the first rigorous treatment of non-linear KRR under power-law data.
We consider a mean-field control problem in which admissible controls are required to be adapted to the common noise filtration. The main objective is to show how the mean-field control problem can be approximates by time consistent centralized finite population problems in which the central planner has full information on all agents' states and gives an identical signal to all agents. We also aim at establishing the optimal convergence rate. In a first general path-dependent setting, we only prove convergence by using weak convergence techniques of probability measures on the canonical space. Next, when only the drift coefficient is controlled, we obtain a backward SDE characterization of the value process, based on which a convergence rate is established in terms of the Wasserstein distance between the original measure and the empirical one induced by the particles. It requires Lipschitz continuity conditions in the Wasserstein sense. The convergence rate is optimal. In a Markovian setting and under convexity conditions on the running reward function, we next prove uniqueness of the optimal control and provide regularity results on the value function, and then deduce the optimal weak convergence rate in terms of the number of particles. Finally, we apply these results to the study of a classical optimal control problem with partial observation, leading to an original approximation method by particle systems.
A central question in evolutionary biology is how to quantitatively understand the dynamics of genetically diverse populations. Modeling the genotype distribution is challenging, as it ultimately requires tracking all correlations (or cumulants) among alleles at different loci. The quasi-linkage equilibrium (QLE) approximation simplifies this by assuming that correlations between alleles at different loci are weak -- i.e., low linkage disequilibrium -- allowing their dynamics to be modeled perturbatively. However, QLE breaks down under strong selection, significant epistatic interactions, or weak recombination. We extend the multilocus QLE framework to allow cumulants up to order KK to evolve dynamically, while higher-order cumulants (>K>K) are assumed to equilibrate rapidly. This extended QLE (exQLE) framework yields a general equation of motion for cumulants up to order KK, which parallels the standard QLE dynamics (recovered when K=1K = 1). In this formulation, cumulant dynamics are driven by the gradient of average fitness, mediated by a geometrically interpretable matrix that stems from competition among genotypes. Our analysis shows that the exQLE with K=2K=2 accurately captures cumulant dynamics even when the fitness function includes higher-order (e.g., third-- or fourth--order) epistatic interactions, capabilities that standard QLE lacks. We also applied the exQLE framework to infer fitness parameters from temporal sequence data. Overall, exQLE provides a systematic and interpretable approximation scheme, leveraging analytical cumulant dynamics and reducing complexity by progressively truncating higher-order cumulants.
Decision-focused learning (DFL) is an increasingly popular paradigm for training predictive models whose outputs are used in decision-making tasks. Instead of merely optimizing for predictive accuracy, DFL trains models to directly minimize the loss associated with downstream decisions. However, existing studies focus solely on scenarios where a fixed batch of data is available and the objective function does not change over time. We instead investigate DFL in dynamic environments where the objective function and data distribution evolve over time. This setting is challenging for online learning because the objective function has zero or undefined gradients -- which prevents the use of standard first-order optimization methods -- and is generally non-convex. To address these difficulties, we (i) regularize the objective to make it differentiable and (ii) use perturbation techniques along with a near-optimal oracle to overcome non-convexity. Combining those techniques yields two original online algorithms tailored for DFL, for which we establish respectively static and dynamic regret bounds. These are the first provable guarantees for the online decision-focused problem. Finally, we showcase the effectiveness of our algorithms on a knapsack experiment, where they outperform two standard benchmarks.
Researchers from Inria, ENS, CNRS, and PSL introduce WARI and SMS, two new evaluation measures for time series segmentation, alongside a formal typology of segmentation errors. These measures enhance the interpretability of segmentation quality by accounting for temporal error positions and specific error types, providing diagnostic insights into algorithm performance.
The paper rigorously characterizes the high-dimensional SGD dynamics in simplified one-layer attention networks on sequential data using Sequence Single-Index (SSI) models, deriving the population loss as a function of semantic and positional alignment. This analysis identifies a "Sequence Information Exponent" (SIE) that dictates sample complexity and quantifies how attention mechanisms and positional encodings can accelerate learning.
A prevalent practice in recommender systems consists in averaging item embeddings to represent users or higher-level concepts in the same embedding space. This paper investigates the relevance of such a practice. For this purpose, we propose an expected precision score, designed to measure the consistency of an average embedding relative to the items used for its construction. We subsequently analyze the mathematical expression of this score in a theoretical setting with specific assumptions, as well as its empirical behavior on real-world data from music streaming services. Our results emphasize that real-world averages are less consistent for recommendation, which paves the way for future research to better align real-world embeddings with assumptions from our theoretical setting.
Deep generative models have made tremendous progress in modeling complex data, often exhibiting generation quality that surpasses a typical human's ability to discern the authenticity of samples. Undeniably, a key driver of this success is enabled by the massive amounts of web-scale data consumed by these models. Due to these models' striking performance and ease of availability, the web will inevitably be increasingly populated with synthetic content. Such a fact directly implies that future iterations of generative models will be trained on both clean and artificially generated data from past models. In this paper, we develop a framework to rigorously study the impact of training generative models on mixed datasets -- from classical training on real data to self-consuming generative models trained on purely synthetic data. We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough and the proportion of clean training data (w.r.t. synthetic data) is large enough. We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models on CIFAR10 and FFHQ.
3
Making inferences with a deep neural network on a batch of states is much faster with a GPU than making inferences on one state after another. We build on this property to propose Monte Carlo Tree Search algorithms using batched inferences. Instead of using either a search tree or a transposition table we propose to use both in the same algorithm. The transposition table contains the results of the inferences while the search tree contains the statistics of Monte Carlo Tree Search. We also propose to analyze multiple heuristics that improve the search: the μ\mu FPU, the Virtual Mean, the Last Iteration and the Second Move heuristics. They are evaluated for the game of Go using a MobileNet neural network.
27
The syntactic structures of sentences can be readily read-out from the activations of large language models (LLMs). However, the ``structural probes'' that have been developed to reveal this phenomenon are typically evaluated on an indiscriminate set of sentences. Consequently, it remains unclear whether structural and/or statistical factors systematically affect these syntactic representations. To address this issue, we conduct an in-depth analysis of structural probes on three controlled benchmarks. Our results are three-fold. First, structural probes are biased by a superficial property: the closer two words are in a sentence, the more likely structural probes will consider them as syntactically linked. Second, structural probes are challenged by linguistic properties: they poorly represent deep syntactic structures, and get interfered by interacting nouns or ungrammatical verb forms. Third, structural probes do not appear to be affected by the predictability of individual words. Overall, this work sheds light on the current challenges faced by structural probes. Providing a benchmark made of controlled stimuli to better evaluate their performance.
Dissipative cat-qubits are a promising architecture for quantum processors due to their built-in quantum error correction. By leveraging two-photon stabilization, they achieve an exponentially suppressed bit-flip error rate as the distance in phase-space between their basis states increases, incurring only a linear increase in phase-flip rate. This property substantially reduces the number of qubits required for fault-tolerant quantum computation. Here, we implement a squeezing deformation of the cat qubit basis states, further extending the bit-flip time while minimally affecting the phase-flip rate. We demonstrate a steep reduction in the bit-flip error rate with increasing mean photon number, characterized by a scaling exponent γ=4.3\gamma=4.3, rising by a factor of 74 per added photon. Specifically, we measure bit-flip times of 22 seconds for a phase-flip time of 1.3 μ\mus in a squeezed cat qubit with an average photon number nˉ=4.1\bar{n}=4.1, a 160-fold improvement in bit-flip time compared to a standard cat. Moreover, we demonstrate a two-fold reduction in ZZ-gate infidelity, with an estimated phase-flip probability of ϵX=0.085\epsilon_X = 0.085 and a bit-flip probability of ϵZ=2.65109\epsilon_Z = 2.65 \cdot 10^{-9} which confirms the gate bias-preserving property. This simple yet effective technique enhances cat qubit performances without requiring design modification, moving multi-cat architectures closer to fault-tolerant quantum computation.
Neural scaling laws underlie many of the recent advances in deep learning, yet their theoretical understanding remains largely confined to linear models. In this work, we present a systematic analysis of scaling laws for quadratic and diagonal neural networks in the feature learning regime. Leveraging connections with matrix compressed sensing and LASSO, we derive a detailed phase diagram for the scaling exponents of the excess risk as a function of sample complexity and weight decay. This analysis uncovers crossovers between distinct scaling regimes and plateau behaviors, mirroring phenomena widely reported in the empirical neural scaling literature. Furthermore, we establish a precise link between these regimes and the spectral properties of the trained network weights, which we characterize in detail. As a consequence, we provide a theoretical validation of recent empirical observations connecting the emergence of power-law tails in the weight spectrum with network generalization performance, yielding an interpretation from first principles.
1
We measure galaxy sizes from 2 &lt; z &lt; 10 using COSMOS-Web, the largest-area JWST imaging survey to date, covering \sim0.54 deg2^2. We analyze the rest-frame optical (~5000A) size evolution and its scaling relation with stellar mass (ReMαR_e\propto M_*^\alpha) for star-forming and quiescent galaxies. For star-forming galaxies, the slope α\alpha remains approximately 0.20 at $2 < z < 8$, showing no significant evolution over this redshift range. At higher redshifts, the slopes are 0.13±0.15-0.13 \pm 0.15 and 0.37±0.360.37 \pm 0.36 for 8 &lt; z &lt; 9 and 9 &lt; z &lt; 10, respectively. At fixed galaxy mass, the size evolution for star-forming galaxies follows Re(1+z)βR_e \propto (1+z)^{-\beta}, with $\beta = 1.21 \pm 0.05.Forquiescentgalaxies,theslopeissteeper. For quiescent galaxies, the slope is steeper \alpha\sim 0.5-0.8$ at 2 &lt; z &lt; 5, and β=0.81±0.26\beta=0.81\pm0.26. We find that the size-mass relation is consistent between UV and optical at z &lt; 8 for star-forming galaxies. However, we observe a decrease in the slope from UV to optical at z &gt; 8, with a tentative negative slope in the optical at 8 &lt; z &lt; 9, suggesting a complex interplay between intrinsic galaxy properties and observational effects such as dust attenuation. We discuss the ratio between galaxies' half-light radius, and underlying halos' virial radius, RvirR_{vir}, and find the median value of Re/Rvir=2.7%R_e/R_{vir}=2.7\%. The star formation rate surface density evolves as logΣSFR=(0.20±0.08)z+(0.65±0.51)\log\Sigma_\text{SFR} = (0.20\pm0.08)\,z+(-0.65\pm0.51), and the ΣSFR\Sigma_\text{SFR}-MM_* relation remains flat at 232 3 provides new insights into galaxy size and related properties in the rest-frame optical.
Deep convolutional networks provide state of the art classifications and regressions results over many high-dimensional problems. We review their architecture, which scatters data with a cascade of linear filter weights and non-linearities. A mathematical framework is introduced to analyze their properties. Computations of invariants involve multiscale contractions, the linearization of hierarchical symmetries, and sparse separations. Applications are discussed.
We present K-band interferometric observations of the PDS 70 protoplanets along with their host star using VLTI/GRAVITY. We obtained K-band spectra and 100 μ\muas precision astrometry of both PDS 70 b and c in two epochs, as well as spatially resolving the hot inner disk around the star. Rejecting unstable orbits, we found a nonzero eccentricity for PDS 70 b of 0.17±0.060.17 \pm 0.06, a near-circular orbit for PDS 70 c, and an orbital configuration that is consistent with the planets migrating into a 2:1 mean motion resonance. Enforcing dynamical stability, we obtained a 95% upper limit on the mass of PDS 70 b of 10 MJupM_\textrm{Jup}, while the mass of PDS 70 c was unconstrained. The GRAVITY K-band spectra rules out pure blackbody models for the photospheres of both planets. Instead, the models with the most support from the data are planetary atmospheres that are dusty, but the nature of the dust is unclear. Any circumplanetary dust around these planets is not well constrained by the planets' 1-5 μ\mum spectral energy distributions (SEDs) and requires longer wavelength data to probe with SED analysis. However with VLTI/GRAVITY, we made the first observations of a circumplanetary environment with sub-au spatial resolution, placing an upper limit of 0.3~au on the size of a bright disk around PDS 70 b.
Ambient field suppression is critical for accurate magnetic field measurements, and a requirement for certain low-field sensors to operate. The difference in magnitude between noise and signal (up to 109^9) makes the problem challenging, and solutions such as passive shielding, post-hoc processing, and most active shielding designs do not address it completely. Zero field active shielding (ZFS) achieves accurate field suppression with a feed-forward structure in which correction coils are fed by reference sensors via a matrix found using data-driven methods. Requirements are a sufficient number of correction coils and reference sensors to span the ambient field at the sensors, and to zero out the coil-to-reference sensor coupling. The solution assumes instantaneous propagation and mixing, but it can be extended to handle convolutional effects. Precise calculations based on sensor and coil geometries are not necessary, other than to improve efficiency and usability. The solution is simulated here but not implemented in hardware.
We present here in full detail the evolution of the angular momentum deficit (AMD) during collisions as it was described in (Laskar, PRL,2000). Since then, the AMD has been revealed to be a key parameter for the understanding of the outcome of planetary formation models. We define here the AMD-stability criterion that can be easily verified on a newly discovered planetary system. We show how AMD-stability can be used to establish a classification of the multiplanet systems in order to exhibit the planetary systems that are long-term stable because they are AMD-stable, and those that are AMD-unstable which then require some additional dynamical studies to conclude on their stability. The AMD-stability classification is applied to the 131 multiplanet systems from The Extrasolar Planet Encyclopaedia database (this http URL) for which the orbital elements are sufficiently well known.
Extremely weak long-range forces may lead to apparent violations of the Equivalence Principle. The final MICROSCOPE result, leading at 95 % c.l. to |\delta| &lt; 4.5 \times 10^{-15} or 6.5×10156.5 \times 10^{-15} for a positive or negative Eötvös parameter δ\delta, requires taking into account the spin of the mediator, and the sign of Δ(Q/Ar)TiPt\Delta (Q/A_r)_{\rm{Ti-Pt}} (QQ denoting the new charge involved). A coupling to BLB-L or BB should verify |g_{B-L}|&lt;1.1 \times 10^{-25} or |g_{B}| &lt; 8 \times 10^{-25}, for a spin-1 mediator of mass m &lt; 10^{-14} eV/c2/c^2, with slightly different limits of 1.3×10251.3 \times 10^{-25} or 6.6×1025\,6.6 \times 10^{-25} in the spin-0 case. The limits increase with mm, in a way which depends on the density distribution within the Earth. This involves an hyperbolic form factor, expressed through a bilateral Laplace transform as Φ(x=mR)=sinhmr/mr\Phi(x=mR)= \langle\,\sinh mr/mr \,\rangle, related by analytic continuation to the Earth form factor Φ(ix)=sinmr/mr\Phi(ix)= \langle \,\sin mr/mr \,\rangle . It may be expressed as Φ(x)=3x2(coshxsinhxx)×ρˉ(x)/ρ0\Phi(x) = \frac{3}{x^2}\, (\cosh x - \frac{\sinh x}{x}) \times\, \bar\rho(x)/\rho_0\,, where ρˉ(x)\bar\rho(x) is an effective density, decreasing from the average ρ0\rho_0 at m=0m=0 down to the density at the periphery. We give general integral or multishell expressions of Φ(x)\Phi(x), evaluating it, and ρˉ(x)\bar\rho(x), in a simplified 5-shell model. Φ(x)\Phi(x) may be expanded as $\, \sum \frac{x^{2n}}{(2n+1)!} \frac{\langle \,r^{2n}\,\rangle}{R^{2n}} \simeq 1 + .0827\ x^2 + .00271 \ x^4 + 4.78 \times 10^{-5}\,x^6 + 5.26\times 10^{-7}\, x^8 +\ ... \ ,absolutelyconvergentforall, absolutely convergent for all xandpotentiallyusefulupto and potentially useful up to x\approx 5.Thecouplinglimitsincreaseatlarge. The coupling limits increase at large xlike like mR \ e^{mz/2}/\sqrt{1+mr}( (z=r-Rbeingthesatellitealtitude),gettingmultipliedby being the satellite altitude), getting multiplied by \simeq 1.9,\ 34,or, or 1.2\times 10^9,for, for m = 10^{-13},\ 10^{-12}or or 10^{-11}eV eV/c^2$, respectively.
17 Jul 2024
We study the Bayesian density estimation of data living in the offset of an unknown submanifold of the Euclidean space. In this perspective, we introduce a new notion of anisotropic Hölder for the underlying density and obtain posterior rates that are minimax optimal and adaptive to the regularity of the density, to the intrinsic dimension of the manifold, and to the size of the offset, provided that the latter is not too small -- while still allowed to go to zero. Our Bayesian procedure, based on location-scale mixtures of Gaussians, appears to be convenient to implement and yields good practical results, even for quite singular data.
There are no more papers matching your filters at the moment.