Shimane University
This research provides a rigorous theoretical investigation into the continuous discretization of neural operators, demonstrating a fundamental topological obstruction that prevents general diffeomorphisms in Hilbert spaces from always being continuously approximated by finite-dimensional ones. It constructively shows that strongly monotone or bilipschitz neural operators, however, do permit continuous discretization, offering conditions for their design.
Neural operators serve as universal approximators for general continuous operators. In this paper, we derive the approximation rate of solution operators for the nonlinear parabolic partial differential equations (PDEs), contributing to the quantitative approximation theorem for solution operators of nonlinear PDEs. Our results show that neural operators can efficiently approximate these solution operators without the exponential growth in model complexity, thus strengthening the theoretical foundation of neural operators. A key insight in our proof is to transfer PDEs into the corresponding integral equations via Duahamel's principle, and to leverage the similarity between neural operators and Picard's iteration, a classical algorithm for solving PDEs. This approach is potentially generalizable beyond parabolic PDEs to a range of other equations, including the Navier-Stokes equation, nonlinear Schrödinger equations and nonlinear wave equations, which can be solved by Picard's iteration.
As a natural generalization of the Legendre symbol, the qq-th power residue symbol (a/p)q(a/p)_q is defined for primes pp and qq with p1modqp\equiv 1 \bmod q. In this paper, we generalize the second supplementary law by providing an explicit condition for (q/p)q=1(q/p)_q = 1, when pp has a special form $p = \sum_{i=0}^{q-1} m^i n^{q-1-i}$. This condition is expressed in terms of the polylogarithm Li1q(x)\mathrm{Li}_{1-q}(x) of negative index. Our proof relies on an argument similar to Lemmermeyer's proof of Euler's conjectures for cubic residue.
We study the approximation-theoretic implications of mixture-of-experts architectures for operator learning, where the complexity of a single large neural operator is distributed across many small neural operators (NOs), and each input is routed to exactly one NO via a decision tree. We analyze how this tree-based routing and expert decomposition affect approximation power, sample complexity, and stability. Our main result is a distributed universal approximation theorem for mixture of neural operators (MoNOs): any Lipschitz nonlinear operator between L2([0,1]d)L^2([0,1]^d) spaces can be uniformly approximated over the Sobolev unit ball to arbitrary accuracy ε>0\varepsilon>0 by an MoNO, where each expert NO has a depth, width, and rank scaling as O(ε1)\mathcal{O}(\varepsilon^{-1}). Although the number of experts may grow with accuracy, each NO remains small, enough to fit within active memory of standard hardware for reasonable accuracy levels. Our analysis also yields new quantitative approximation rates for classical NOs approximating uniformly continuous nonlinear operators uniformly on compact subsets of L2([0,1]d)L^2([0,1]^d).
Federated learning (FL) enables collaborative model training without sharing raw data, but individual model updates may still leak sensitive information. Secure aggregation (SecAgg) mitigates this risk by allowing the server to access only the sum of client updates, thereby concealing individual contributions. However, a significant vulnerability has recently attracted increasing attention: when model updates are sparse vectors, a non-zero value contributed by a single client at a given index can be directly revealed in the aggregate, enabling precise data reconstruction attacks. In this paper, we propose a novel enhancement to SecAgg that reveals aggregated values only at indices with at least tt non-zero contributions. Our mechanism introduces a per-element masking strategy to prevent the exposure of under-contributed elements, while maintaining modularity and compatibility with many existing SecAgg implementations by relying solely on cryptographic primitives already employed in a typical setup. We integrate this mechanism into Flamingo, a low-round SecAgg protocol, to provide a robust defense against such attacks. Our analysis and experimental results indicate that the additional computational and communication overhead introduced by our mechanism remains within an acceptable range, supporting the practicality of our approach.
The success of transformers is often linked to their ability to perform in-context learning. Recent work shows that transformers are universal in context, capable of approximating any real-valued continuous function of a context (a probability measure over XRd\mathcal{X}\subseteq \mathbb{R}^d) and a query xXx\in \mathcal{X}. This raises the question: Does in-context universality explain their advantage over classical models? We answer this in the negative by proving that MLPs with trainable activation functions are also universal in-context. This suggests the transformer's success is likely due to other factors like inductive bias or training stability.
We study the vacuum stability and perturbativity conditions in the minimal type-II seesaw model. These conditions give characteristic constraints to model parameters. In the model, there is a SU(2)LSU(2)_L triplet scalar field, which could cause a large Higgs mass correction. From the naturalness point of view, heavy Higgs masses should be lower than 350GeV350\,{\rm GeV}, which can be testable by the LHC Run-II results. Due to effects of the triplet scalar field, branching ratios of the Higgs decay (hγγ,Zγh\to \gamma \gamma, Z\gamma) deviate from the standard model, and large parameter region is excluded by the recent ATLAS and CMS combined analysis of hγγh\to \gamma \gamma. Our result of the signal strength for hγγh\to \gamma \gamma is Rγγ1.1R_{\gamma \gamma} \lesssim 1.1, but its deviation is too small to observe at the LHC experiment.
Although the context length limitation of large language models (LLMs) has been mitigated, it still hinders their application to software development tasks. This study proposes a method incorporating execution traces into RAG for inquiries about source code. Small-scale experiments confirm a tendency for the method to contribute to improving LLM response quality.
We compute the joint distribution of two consecutive eigenphase spacings and their ratio for Haar-distributed U(N)\mathrm{U}(N) matrices (the circular unitary ensemble) using our framework for Jánossy densities in random matrix theory, formulated via the Tracy-Widom system of nonlinear PDEs. Our result shows that the leading finite-NN correction in the gap-ratio distribution relative to the universal sine-kernel limit is of O(N4)\mathcal{O}(N^{-4}), reflecting a nontrivial cancellation of the O(N2)\mathcal{O}(N^{-2}) part present in the joint distributions of consecutive spacings. This finding suggests the potential to extract subtle finite-size corrections from the energy spectra of quantum-chaotic systems and explains why the deviation of the gap-ratio distribution of the Riemann zeta zeros {1/2+iγn},γnT1\{1/2+i\gamma_n\}, \gamma_n\approx T\gg1 from the sine-kernel prediction scales as (log(T/2π))3\left(\log(T/2\pi)\right)^{-3}.
The Jánossy density for a determinantal point process is the probability density that an interval II contains exactly pp points except for those at kk designated loci. The Jánossy density associated with an integrable kernel K(φ(x)ψ(y)ψ(x)φ(y))/(xy)\mathbf{K}\doteq (\varphi(x)\psi(y)-\psi(x)\varphi(y))/(x-y) is shown to be expressed as a Fredholm determinant Det(IK~I)\mathrm{Det}(\mathbb{I}-\tilde{\mathbf{K}}|_I) of a transformed kernel K~(φ~(x)ψ~(y)ψ~(x)φ~(y))/(xy)\tilde{\mathbf{K}}\doteq (\tilde{\varphi}(x)\tilde{\psi}(y)-\tilde{\psi}(x)\tilde{\varphi}(y))/(x-y). We observe that K~\tilde{\mathbf{K}} satisfies Tracy and Widom's criteria if K\mathbf{K} does, because of the structure that the map (φ,ψ)(φ~,ψ~)(\varphi, \psi)\mapsto (\tilde{\varphi}, \tilde{\psi}) is a meromorphic SL(2,R)\mathrm{SL}(2,\mathbb{R}) gauge transformation between covariantly constant sections. This observation enables application of the Tracy--Widom method to Jánossy densities, expressed in terms of a solution to a system of differential equations in the endpoints of the interval. Our approach does not explicitly refer to isomonodromic systems associated with Painlevé equations employed in the preceding works. As illustrative examples we compute Jánossy densities with k=1,p=0k=1, p=0 for Airy and Bessel kernels, related to the joint distributions of the two largest eigenvalues of random Hermitian matrices and of the two smallest singular values of random complex matrices.
For a shell model of the fully developed turbulence and the incompressible Navier-Stokes equations in the Fourier space, when a Gaussian white noise is artificially added to the equation of each mode, an expression of the mean linear response function in terms of the velocity correlation functions is derived by applying the method developed for nonequilibrium Langevin systems [Harada and Sasa, Phys. Rev. Lett. 95, 130602 (2005)]. We verify numerically for the shell model case that the derived expression of the response function, as the noise tends to zero, converges to the response function of the noiseless shell model.
We formulate a new two-variable river environmental restoration problem based on jump stochastic differential equations (SDEs) governing the sediment storage and nuisance benthic algae population dynamics in a dam-downstream river. Controlling the dynamics is carried out through impulsive sediment replenishment with discrete and random observation/intervention to avoid sediment depletion and thick algae growth. We consider a cost-efficient management problem of the SDEs to achieve the objectives whose resolution reduces to solving a Hamilton-Jacobi-Bellman (HJB) equation. We also consider a Fokker-Planck (FP) equation governing the probability density function of the controlled dynamics. The HJB equation has a discontinuous solution, while the FP equation has a Dirac's delta along boundaries. We show that the value function, the optimized objective function, is governed by the HJB equation in the simplified case and further that a threshold-type control is optimal. We demonstrate that simple numerical schemes can handle these equations. Finally, we numerically analyze the optimal controls and the resulting probability density functions.
Recently there has been great interest in operator learning, where networks learn operators between function spaces from an essentially infinite-dimensional perspective. In this work we present results for when the operators learned by these networks are injective and surjective. As a warmup, we combine prior work in both the finite-dimensional ReLU and operator learning setting by giving sharp conditions under which ReLU layers with linear neural operators are injective. We then consider the case the case when the activation function is pointwise bijective and obtain sufficient conditions for the layer to be injective. We remark that this question, while trivial in the finite-rank case, is subtler in the infinite-rank case and is proved using tools from Fredholm theory. Next, we prove that our supplied injective neural operators are universal approximators and that their implementation, with finite-rank neural networks, are still injective. This ensures that injectivity is not `lost' in the transcription from analytical operators to their finite-rank implementation with networks. Finally, we conclude with an increase in abstraction and consider general conditions when subnetworks, which may be many layers deep, are injective and surjective and provide an exact inversion from a `linearization.' This section uses general arguments from Fredholm theory and Leray-Schauder degree theory for non-linear integral equations to analyze the mapping properties of neural operators in function spaces. These results apply to subnetworks formed from the layers considered in this work, under natural conditions. We believe that our work has applications in Bayesian UQ where injectivity enables likelihood estimation and in inverse problems where surjectivity and injectivity corresponds to existence and uniqueness, respectively.
The diversity of programming languages is growing, making the language extensibility of code clone detectors crucial. However, this is challenging for most existing clone detection detectors because the source code handler needs modifications, which require specialist-level knowledge of the targeted language and is time-consuming. Multilingual code clone detectors make it easier to add new language support by providing syntax information of the target language only. To address the shortcomings of existing multilingual detectors for language scalability and detection performance, we propose a multilingual code block extraction method based on ANTLR parser generation, and implement a multilingual code clone detector (MSCCD), which supports the most significant number of languages currently available and has the ability to detect Type-3 code clones. We follow the methodology of previous studies to evaluate the detection performance of the Java language. Compared to ten state-of-the-art detectors, MSCCD performs at an average level while it also supports a significantly larger number of languages. Furthermore, we propose the first multilingual syntactic code clone evaluation benchmark based on the CodeNet database. Our results reveal that even when applying the same detection approach, performance can vary markedly depending on the language of the source code under investigation. Overall, MSCCD is the most balanced one among the evaluated tools when considering detection performance and language extensibility.
We redefine a multiplicative group structure on the set of equivalence classes of rational sequences satisfying a fixed linear recurrence of degree two, which was defined by R. R. Laxton in his paper "On groups of linear recurrences I" published in Duke Math. 36, 721--736 (1969). In the article, he also defined some natural subgroups of the group, and determined the structures of their quotient groups. However, he did not study the whole group itself. Nothing has been known about the structure of Laxton's whole group and its interpretation. The aims of this paper are to redefine Laxton's group in a natural way and determine the structure of the whole group itself, which clarifies Laxton's results on the quotient groups. According to our formulation by algebraic number theory method, we can simplify the proof of Laxton's results. Our definition also gives a natural interpretation of Laxton's results, and makes us possible to use the group to show various properties of such sequences.
Forward-backwards stochastic differential equations (FBSDEs) are central in optimal control, game theory, economics, and mathematical finance. Unfortunately, the available FBSDE solvers operate on \textit{individual} FBSDEs, meaning that they cannot provide a computationally feasible strategy for solving large families of FBSDEs as these solvers must be re-run several times. \textit{Neural operators} (NOs) offer an alternative approach for \textit{simultaneously solving} large families of FBSDEs by directly approximating the solution operator mapping \textit{inputs:} terminal conditions and dynamics of the backwards process to \textit{outputs:} solutions to the associated FBSDE. Though universal approximation theorems (UATs) guarantee the existence of such NOs, these NOs are unrealistically large. We confirm that ``small'' NOs can uniformly approximate the solution operator to structured families of FBSDEs with random terminal time, uniformly on suitable compact sets determined by Sobolev norms, to any prescribed error \varepsilon>0 using a depth of O(log(1/ε))\mathcal{O}(\log(1/\varepsilon)), a width of O(1)\mathcal{O}(1), and a sub-linear rank; i.e. O(1/εr)\mathcal{O}(1/\varepsilon^r) for some r<1. This result is rooted in our second main contribution, which shows that convolutional NOs of similar depth, width, and rank can approximate the solution operator to a broad class of Elliptic PDEs. A key insight here is that the convolutional layers of our NO can efficiently encode the Green's function associated to the Elliptic PDEs linked to our FBSDEs. A byproduct of our analysis is the first theoretical justification for the benefit of lifting channels in NOs: they exponentially decelerate the growth rate of the NO's rank.
We explore a model based on the classically-scale invariant standard model (SM) with a strongly coupled vector-like dynamics, which is called hypercolor (HC). The scale symmetry is dynamically broken by the vector-like condensation at the TeV scale, so that the SM Higgs acquires the negative mass-squared by the bosonic seesaw mechanism to realize the electroweak symmetry breaking. An elementary pseudoscalar SS is introduced to give masses for the composite Nambu-Goldstone bosons (HC pions): the HC pion can be a good target to explore through a diphoton channel at the LHC. As the consequence of the bosonic seesaw, the fluctuating mode of SS, which we call ss, develops tiny couplings to the SM particles and is predicted to be very light. The ss predominantly decays to diphoton and can behave as an invisible axion-like dark matter. The mass of the ss-dark matter is constrained by currently available cosmological and astrophysical limits to be 104eVms1eV10^{-4} {\rm eV} \lesssim m_s \lesssim 1 \,{\rm eV}. We find that the sufficient amount of relic abundance for the ss-dark matter can be accumulated via the coherent oscillation. The detection potential in microwave cavity experiments is also addressed.
Frictional phenomena of two-dimensional elastic lattices are studied numerically based on a two-dimensional Frenkel-Kontorova model with impurities. It is shown that impurities can assist the depinning. We also investigate anisotropic ordering and transverse pinning effects of sliding lattices, which are characteristic of the moving Bragg glass state and/or transverse glass state. Peculiar velocity dependence of the transverse pinning is observed in the presence of both periodic and random potentials and discussed in the relation with growing order and discommensurate structures.
Nondeterministic circuits are a nondeterministic computation model in circuit complexity theory. In this paper, we prove a 3(n1)3(n-1) lower bound for the size of nondeterministic U2U_2-circuits computing the parity function. It is known that the minimum size of (deterministic) U2U_2-circuits computing the parity function exactly equals 3(n1)3(n-1). Thus, our result means that nondeterministic computation is useless to compute the parity function by U2U_2-circuits and cannot reduce the size from 3(n1)3(n-1). To the best of our knowledge, this is the first nontrivial lower bound for the size of nondeterministic circuits (including formulas, constant depth circuits, and so on) with unlimited nondeterminism for an explicit Boolean function. We also discuss an approach to proving lower bounds for the size of deterministic circuits via lower bounds for the size of nondeterministic restricted circuits.
Marine protected areas (MPAs) have attracted much attention as a tool for sustainable fisheries management, restoring depleted fisheries stocks and maintaining ecosystems. However, even with total exclusion of fishing effort, depleted stocks sometimes show little or no recovery over a long time period. Here, using a mathematical model, we show that multiple stable states may hold the key to understanding the tendency for fisheries stocks to recover because of MPAs. We find that MPAs can have either a positive effect or almost no effect on the recovery of depleted fishing stocks, depending on the fish migration patterns and the fishing policies. MPAs also reinforce ecological resilience, particularly for migratory species. In contrast to previous reports, our results show that MPAs have small or sometimes negative effects on the recovery of sedentary species. Unsuitable MPA planning might result in low effectiveness or even deterioration of the existing condition.
There are no more papers matching your filters at the moment.