University of Munster
We study the dynamics of an optoelectronic circuit composed of an excitable nanoscale resonant-tunneling diode (RTD) driving a nanolaser diode (LD) coupled via time-delayed feedback. Using a combination of numerical path-continuation methods and time simulations, we demonstrate that this RTD-LD system can serve as an artificial neuron, generating pulses in the form of temporal localized states (TLSs) that can be employed as memory for neuromorphic computing. In particular, our findings reveal that the prototypical delayed FitzHugh-Nagumo model previously employed to model the RTD-LD resembles our more realistic model only in the limit of a slow RTD. We show that the RTD time scale plays a critical role in memory capacity as it governs a shift in pulse interaction from repulsive to attractive, leading to a transition from stable to unstable multi-pulse TLSs. Our theoretical analysis uncovers features and challenges previously unknown for the RTD-LD system, including the multistability of TLSs and attractive interaction forces, stemming from the previously neglected intrinsic dynamics of the laser. These effects are crucial to consider since they define the memory properties of the RTD-LD.
The XLZD collaboration is developing a two-phase xenon time projection chamber with an active mass of 60 to 80 t capable of probing the remaining WIMP-nucleon interaction parameter space down to the so-called neutrino fog. In this work we show that, based on the performance of currently operating detectors using the same technology and a realistic reduction of radioactivity in detector materials, such an experiment will also be able to competitively search for neutrinoless double beta decay in 136^{136}Xe using a natural-abundance xenon target. XLZD can reach a 3σ\sigma discovery potential half-life of 5.7×\times1027^{27} yr (and a 90% CL exclusion of 1.3×\times1028^{28} yr) with 10 years of data taking, corresponding to a Majorana mass range of 7.3-31.3 meV (4.8-20.5 meV). XLZD will thus exclude the inverted neutrino mass ordering parameter space and will start to probe the normal ordering region for most of the nuclear matrix elements commonly considered by the community.
Neutrinos are the most abundant fundamental matter particles in the Universe and play a crucial role in particle physics and cosmology. Neutrino oscillation, discovered about 25 years ago, reveals that the three known species mix with each other. Anomalous results from reactor and radioactive-source experiments suggest a possible fourth neutrino state, the sterile neutrino, which does not interact via the weak force. The KATRIN experiment, primarily designed to measure the neutrino mass via tritium β\beta-decay, also searches for sterile neutrinos suggested by these anomalies. A sterile-neutrino signal would appear as a distortion in the β\beta-decay energy spectrum, characterized by a discontinuity in curvature (kink) related to the sterile-neutrino mass. This signature, which depends only on the shape of the spectrum rather than its absolute normalization, offers a robust, complementary approach to reactor experiments. KATRIN examined the energy spectrum of 36 million tritium β\beta-decay electrons recorded in 259 measurement days within the last 40 electronvolt below the endpoint. The results exclude a substantial part of the parameter space suggested by the gallium anomaly and challenge the Neutrino-4 claim. Together with other neutrino-disappearance experiments, KATRIN probes sterile-to-active mass splittings from a fraction of an electron-volt squared to several hundred electron-volts squared, excluding light sterile neutrinos with mixing angles above a few percent.
Deep learning algorithms -- typically consisting of a class of deep artificial neural networks (ANNs) trained by a stochastic gradient descent (SGD) optimization method -- are nowadays an integral part in many areas of science, industry, and also our day to day life. Roughly speaking, in their most basic form, ANNs can be regarded as functions that consist of a series of compositions of affine-linear functions with multidimensional versions of so-called activation functions. One of the most popular of such activation functions is the rectified linear unit (ReLU) function Rxmax{x,0}R\mathbb{R} \ni x \mapsto \max\{ x, 0 \} \in \mathbb{R}. The ReLU function is, however, not differentiable and, typically, this lack of regularity transfers to the cost function of the supervised learning problem under consideration. Regardless of this lack of differentiability issue, deep learning practioners apply SGD methods based on suitably generalized gradients in standard deep learning libraries like {\sc TensorFlow} or {\sc Pytorch}. In this work we reveal an accurate and concise mathematical description of such generalized gradients in the training of deep fully-connected feedforward ANNs and we also study the resulting generalized gradient function analytically. Specifically, we provide an appropriate approximation procedure that uniquely describes the generalized gradient function, we prove that the generalized gradients are limiting Fréchet subgradients of the cost functional, and we conclude that the generalized gradients must coincide with the standard gradient of the cost functional on every open sets on which the cost functional is continuously differentiable.
Researchers from the University of Münster and CUHK-Shenzhen establish optimal convergence rates for the original Adam optimizer in stochastic optimization problems, demonstrating that Adam converges to the zeros of a newly identified "Adam vector field" rather than directly to the objective function's gradient zeros. They show this limit point approaches the true minimizer at a rate of M⁻¹ as mini-batch size M increases.
15
Researchers from the University of Münster, The Chinese University of Hong Kong, Shenzhen, and ETH Zurich present a mathematically rigorous overview of Denoising Diffusion Probabilistic Models (DDPMs), detailing their foundational stochastic processes and training objectives. The work formalizes concepts from basic DDPMs to advanced variants like Stable Diffusion, providing a comprehensive theoretical framework for generative artificial intelligence.
Researchers from the XLZD Collaboration developed a model-independent, likelihood-free search pipeline for new physics in the proposed DARWIN experiment, utilizing semi-supervised deep learning on high-dimensional detector data. The pipeline achieved a median sensitivity of approximately 3σ to reject the background-only hypothesis for a benchmark WIMP signal after 200 ton-years of exposure, substantially outperforming traditional likelihood-based methods that reached only 1σ.
Deep learning methods - consisting of a class of deep neural networks (DNNs) trained by a stochastic gradient descent (SGD) optimization method - are nowadays key tools to solve data driven supervised learning problems. Despite the great success of SGD methods in the training of DNNs, it remains a fundamental open problem of research to explain the success and the limitations of such methods in rigorous theoretical terms. In particular, even in the standard setup of data driven supervised learning problems, it remained an open research problem to prove (or disprove) that SGD methods converge in the training of DNNs with the popular rectified linear unit (ReLU) activation function with high probability to global minimizers in the optimization landscape. In this work we answer this question negatively. Specifically, in this work we prove for a large class of SGD methods that the considered optimizer does with high probability not converge to global minimizers of the optimization problem. It turns out that the probability to not converge to a global minimizer converges at least exponentially quickly to one as the width of the first hidden layer of the ANN and the depth of the ANN, respectively, increase. The general non-convergence results of this work do not only apply to the plain vanilla standard SGD method but also to a large class of accelerated and adaptive SGD methods such as the momentum SGD, the Nesterov accelerated SGD, the Adagrad, the RMSProp, the Adam, the Adamax, the AMSGrad, and the Nadam optimizers.
Despite the omnipresent use of stochastic gradient descent (SGD) optimization methods in the training of deep neural networks (DNNs), it remains, in basically all practically relevant scenarios, a fundamental open problem to provide a rigorous theoretical explanation for the success (and the limitations) of SGD optimization methods in deep learning. In particular, it remains an open question to prove or disprove convergence of the true risk of SGD optimization methods to the optimal true risk value in the training of DNNs. In one of the main results of this work we reveal for a general class of activations, loss functions, random initializations, and SGD optimization methods (including, for example, standard SGD, momentum SGD, Nesterov accelerated SGD, Adagrad, RMSprop, Adadelta, Adam, Adamax, Nadam, Nadamax, and AMSGrad) that in the training of any arbitrary fully-connected feedforward DNN it does not hold that the true risk of the considered optimizer converges in probability to the optimal true risk value. Nonetheless, the true risk of the considered SGD optimization method may very well converge to a strictly suboptimal true risk value.
Radiogenic neutrons emitted by detector materials are one of the most challenging backgrounds for the direct search of dark matter in the form of weakly interacting massive particles (WIMPs). To mitigate this background, the XENONnT experiment is equipped with a novel gadolinium-doped water Cherenkov detector, which encloses the xenon dual-phase time projection chamber (TPC). The neutron veto (NV) tags neutrons via their capture on gadolinium or hydrogen, which release γ\gamma-rays that are subsequently detected as Cherenkov light. In this work, we present the key features and the first results of the XENONnT NV when operated with demineralized water in the initial phase of the experiment. Its efficiency for detecting neutrons is $(82\pm 1)\,\%$, the highest neutron detection efficiency achieved in a water Cherenkov detector. This enables a high efficiency of (53±3)%(53\pm 3)\,\% for the tagging of WIMP-like neutron signals, inside a tagging time window of $250\,\mathrm{\mu s}betweenTPCandNV,leadingtoalivetimelossof between TPC and NV, leading to a livetime loss of 1.6\,\%$ during the first science run of XENONnT.
We study a continuous-time system that solves optimization problems over the set of orthonormal matrices, which is also known as the Stiefel manifold. The resulting optimization flow follows a path that is not always on the manifold but asymptotically lands on the manifold. We introduce a generalized Stiefel manifold to which we extend the canonical metric of the Stiefel manifold. We show that the vector field of the proposed flow can be interpreted as the sum of a Riemannian gradient on a generalized Stiefel manifold and a normal vector. Moreover, we prove that the proposed flow globally converges to the set of critical points, and any local minimum and isolated critical point is asymptotically stable.
Averaging techniques such as Ruppert--Polyak averaging and exponential movering averaging (EMA) are powerful approaches to accelerate optimization procedures of stochastic gradient descent (SGD) optimization methods such as the popular ADAM optimizer. However, depending on the specific optimization problem under consideration, the type and the parameters for the averaging need to be adjusted to achieve the smallest optimization error. In this work we propose an averaging approach, which we refer to as parallel averaged ADAM (PADAM), in which we compute parallely different averaged variants of ADAM and during the training process dynamically select the variant with the smallest optimization error. A central feature of this approach is that this procedure requires no more gradient evaluations than the usual ADAM optimizer as each of the averaged trajectories relies on the same underlying ADAM trajectory and thus on the same underlying gradients. We test the proposed PADAM optimizer in 13 stochastic optimization and deep neural network (DNN) learning problems and compare its performance with known optimizers from the literature such as standard SGD, momentum SGD, Adam with and without EMA, and ADAMW. In particular, we apply the compared optimizers to physics-informed neural network, deep Galerkin, deep backward stochastic differential equation and deep Kolmogorov approximations for boundary value partial differential equation problems from scientific machine learning, as well as to DNN approximations for optimal control and optimal stopping problems. In nearly all of the considered examples PADAM achieves, sometimes among others and sometimes exclusively, essentially the smallest optimization error. This work thus strongly suggest to consider PADAM for scientific machine learning problems and also motivates further research for adaptive averaging procedures within the training of DNNs.
In the recent sixth data release (DR6) of the Atacama Cosmology Telescope (ACT) collaboration, the value of ns=0.9743±0.0034n_{\rm s}=0.9743 \pm 0.0034 for the scalar spectral index is reported, which excludes the Starobinsky and Higgs inflationary models at 2σ2\sigma level. In this paper, we perform a Bayesian inference of the parameters of the Starobinsky or Higgs inflationary model with non-instantaneous reheating using the Markov chain Monte Carlo method. For the analysis, we use observational data on the cosmic microwave background collected by the Planck and ACT collaborations and on baryonic acoustic oscillations from the DESI collaboration. The reheating stage is modelled by a single parameter RrehR_{\rm reh}. Using the modified Boltzmann code CLASS and the cobaya software with the GetDist package, we perform a direct inference of the model parameter space and obtain their posterior distributions. Using the Kullback--Leibler divergence, we estimate the information gain from the data, yielding 2.522.52 bits for the reheating parameter. Inclusion of the ACT DR6 data provides 75%75\% more information about the reheating stage compared to analysis without ACT data. We draw constraints on the reheating temperature and the average equation of state. While the former can vary within 1010 orders of magnitude, values in the 95%95\% credible interval indicate a sufficiently low reheating temperature; for the latter there is a clear preference for values greater than 0.50.5, which means that the conventional equations of state for dust ω=0\omega=0 and relativistic matter ω=1/3\omega=1/3 are excluded with more than 2σ2\sigma level of significance. However, there still is a big part of parameter space where Starobinsky and Higgs inflationary models exhibit a high degree of consistency with the latest observational data, particularly from ACT DR6. Therefore, it is premature to reject these models.
We report on the search for X-ray radiation as predicted from dynamical quantum collapse with low-energy electronic recoil data in the energy range of 1-140 keV from the first science run of the XENONnT dark matter detector. Spontaneous radiation is an unavoidable effect of dynamical collapse models, which were introduced as a possible solution to the long-standing measurement problem in quantum mechanics. The analysis utilizes a model that for the first time accounts for cancellation effects in the emitted spectrum, which arise in the X-ray range due to the opposing electron-proton charges in xenon atoms. New world-leading limits on the free parameters of the Markovian continuous spontaneous localization and Di\'osi-Penrose models are set, improving previous best constraints by two orders of magnitude and a factor of five, respectively. The original values proposed for the strength and the correlation length of the continuous spontaneous localization model are excluded experimentally for the first time.
High order methods have shown great potential to overcome performance issues of simulations of partial differential equations (PDEs) on modern hardware, still many users stick to low-order, matrixbased simulations, in particular in porous media applications. Heterogeneous coefficients and low regularity of the solution are reasons not to employ high order discretizations. We present a new approach for the simulation of instationary PDEs that allows to partially mitigate the performance problems. By reformulating the original problem we derive a parallel in time time integrator that increases the arithmetic intensity and introduces additional structure into the problem. By this it helps accelerate matrix-based simulations on modern hardware architectures. Based on a system for multiple time steps we will formulate a matrix equation that can be solved using vectorised solvers like Block Krylov methods. The structure of this approach makes it applicable for a wide range of linear and nonlinear problems. In our numerical experiments we present some first results for three different PDEs, a linear convection-diffusion equation, a nonlinear diffusion-reaction equation and a realistic example based on the Richards' equation.
It is known that the standard stochastic gradient descent (SGD) optimization method, as well as accelerated and adaptive SGD optimization methods such as the Adam optimizer fail to converge if the learning rates do not converge to zero (as, for example, in the situation of constant learning rates). Numerical simulations often use human-tuned deterministic learning rate schedules or small constant learning rates. The default learning rate schedules for SGD optimization methods in machine learning implementation frameworks such as TensorFlow and Pytorch are constant learning rates. In this work we propose and study a learning-rate-adaptive approach for SGD optimization methods in which the learning rate is adjusted based on empirical estimates for the values of the objective function of the considered optimization problem (the function that one intends to minimize). In particular, we propose a learning-rate-adaptive variant of the Adam optimizer and implement it in case of several neural network learning problems, particularly, in the context of deep learning approximation methods for partial differential equations such as deep Kolmogorov methods, physics-informed neural networks, and deep Ritz methods. In each of the presented learning problems the proposed learning-rate-adaptive variant of the Adam optimizer faster reduces the value of the objective function than the Adam optimizer with the default learning rate. For a simple class of quadratic minimization problems we also rigorously prove that a learning-rate-adaptive variant of the SGD optimization method converges to the minimizer of the considered minimization problem. Our convergence proof is based on an analysis of the laws of invariant measures of the SGD method as well as on a more general convergence analysis for SGD with random but predictable learning rates which we develop in this work.
The projected sensitivity of the effective electron neutrino-mass measurement with the KATRIN experiment is below 0.3 eV (90 % CL) after five years of data acquisition. The sensitivity is affected by the increased rate of the background electrons from KATRIN's main spectrometer. A special shifted-analysing-plane (SAP) configuration was developed to reduce this background by a factor of two. The complex layout of electromagnetic fields in the SAP configuration requires a robust method of estimating these fields. We present in this paper a dedicated calibration measurement of the fields using conversion electrons of gaseous 83m^\mathrm{83m}Kr, which enables the neutrino-mass measurements in the SAP configuration.
The goal of this whitepaper is to give a comprehensive overview of the rich field of forward physics. We discuss the occurrences of BFKL resummation effects in special final states, such as Mueller-Navelet jets, jet gap jets, and heavy quarkonium production. It further addresses TMD factorization at low x and the manifestation of a semi-hard saturation scale in (generalized) TMD PDFs. More theoretical aspects of low x physics, probes of the quark gluon plasma, as well as the possibility to use photon-hadron collisions at the LHC to constrain hadronic structure at low x, and the resulting complementarity between LHC and the EIC are also presented. We also briefly discuss diffraction at colliders as well as the possibility to explore further the electroweak theory in central exclusive events using the LHC as a photon-photon collider.
We investigate the nonlinear dynamics of vertically emitting Kerr microcavities under detuned optical injection, considering the impact of slow thermal effects. Our model integrates thermal detuning caused by refractive index shifts due to heating. Through numerical and analytical approaches, we uncover a rich spectrum of dynamical behaviors, including excitable thermo-optical pulses, mixed-mode oscillations, and chaotic spiking, governed by a higher-dimensional canard scenario. Introducing a long external feedback loop with time delays comparable to the microcavity photon lifetime but shorter than thermal relaxation timescales, reveals how delay affects excitability and stabilizes temporal localized states. Our findings extend the understanding of excitable systems, demonstrating how thermal and feedback mechanisms interplay to shape nonlinear optical dynamics. Further, our approach paves the way for the study of cavity stabilization and cavity cooling using an additional control beam.
Predictive business process monitoring (PBPM) is a class of techniques designed to predict behaviour, such as next activities, in running traces. PBPM techniques aim to improve process performance by providing predictions to process analysts, supporting them in their decision making. However, the PBPM techniques` limited predictive quality was considered as the essential obstacle for establishing such techniques in practice. With the use of deep neural networks (DNNs), the techniques` predictive quality could be improved for tasks like the next activity prediction. While DNNs achieve a promising predictive quality, they still lack comprehensibility due to their hierarchical approach of learning representations. Nevertheless, process analysts need to comprehend the cause of a prediction to identify intervention mechanisms that might affect the decision making to secure process performance. In this paper, we propose XNAP, the first explainable, DNN-based PBPM technique for the next activity prediction. XNAP integrates a layer-wise relevance propagation method from the field of explainable artificial intelligence to make predictions of a long short-term memory DNN explainable by providing relevance values for activities. We show the benefit of our approach through two real-life event logs.
There are no more papers matching your filters at the moment.