Moscow Institute of Physics and Technology (MIPT)
Variational Autoencoders (VAEs) remain a cornerstone of generative computer vision, yet their training is often plagued by artifacts that degrade reconstruction and generation quality. This paper introduces VIVAT, a systematic approach to mitigating common artifacts in KL-VAE training without requiring radical architectural changes. We present a detailed taxonomy of five prevalent artifacts - color shift, grid patterns, blur, corner and droplet artifacts - and analyze their root causes. Through straightforward modifications, including adjustments to loss weights, padding strategies, and the integration of Spatially Conditional Normalization, we demonstrate significant improvements in VAE performance. Our method achieves state-of-the-art results in image reconstruction metrics (PSNR and SSIM) across multiple benchmarks and enhances text-to-image generation quality, as evidenced by superior CLIP scores. By preserving the simplicity of the KL-VAE framework while addressing its practical challenges, VIVAT offers actionable insights for researchers and practitioners aiming to optimize VAE training.
Segmenting objects with complex shapes, such as wires, bicycles, or structural grids, remains a significant challenge for current segmentation models, including the Segment Anything Model (SAM) and its high-quality variant SAM-HQ. These models often struggle with thin structures and fine boundaries, leading to poor segmentation quality. We propose Talk2SAM, a novel approach that integrates textual guidance to improve segmentation of such challenging objects. The method uses CLIP-based embeddings derived from user-provided text prompts to identify relevant semantic regions, which are then projected into the DINO feature space. These features serve as additional prompts for SAM-HQ, enhancing its ability to focus on the target object. Beyond improving segmentation accuracy, Talk2SAM allows user-controllable segmentation, enabling disambiguation of objects within a single bounding box based on textual input. We evaluate our approach on three benchmarks: BIG, ThinObject5K, and DIS5K. Talk2SAM consistently outperforms SAM-HQ, achieving up to +5.9\% IoU and +8.3\% boundary IoU improvements. Our results demonstrate that incorporating natural language guidance provides a flexible and effective means for precise object segmentation, particularly in cases where traditional prompt-based methods fail. The source code is available on GitHub: this https URL
We present multiband observations and analysis of EP240801a, a low-energy, extremely soft gamma-ray burst (GRB) discovered on August 1, 2024 by the Einstein Probe (EP) satellite, with a weak contemporaneous signal also detected by Fermi/GBM. Optical spectroscopy of the afterglow, obtained by GTC and Keck, identified the redshift of z=1.6734z = 1.6734. EP240801a exhibits a burst duration of 148 s in X-rays and 22.3 s in gamma-rays, with X-rays leading by 80.61 s. Spectral lag analysis indicates the gamma-ray signal arrived 8.3 s earlier than the X-rays. Joint spectral fitting of EP/WXT and Fermi/GBM data yields an isotropic energy $E_{\gamma,\rm{iso}} = (5.57^{+0.54}_{-0.50})\times 10^{51}\,\rm{erg},apeakenergy, a peak energy E_{\rm{peak}} = 14.90^{+7.08}_{-4.71}\,\rm{keV},afluenceratio, a fluence ratio \rm S(25-50\,\rm{keV})/S(50-100\,\rm{keV}) = 1.67^{+0.74}_{-0.46}$, classifying EP240801a as an X-ray flash (XRF). The host-galaxy continuum spectrum, inferred using Prospector, was used to correct its contribution for the observed outburst optical data. Unusual early RR-band behavior and EP/FXT observations suggest multiple components in the afterglow. Three models are considered: two-component jet model, forward-reverse shock model and forward-shock model with energy injection. Both three provide reasonable explanations. The two-component jet model and the energy injection model imply a relatively small initial energy and velocity of the jet in the line of sight, while the forward-reverse shock model remains typical. Under the two-component jet model, EP240801a may resemble GRB 221009A (BOAT) if the bright narrow beam is viewed on-axis. Therefore, EP240801a can be interpreted as an off-beam (narrow) jet or an intrinsically weak GRB jet. Our findings provide crucial clues for uncovering the origin of XRFs.
The 3D scene graph models spatial relationships between objects, enabling the agent to efficiently navigate in a partially observable environment and predict the location of the target object.This paper proposes an original framework named SGN-CIRL (3D Scene Graph-Based Reinforcement Learning Navigation) for mapless reinforcement learning-based robot navigation with learnable representation of open-vocabulary 3D scene graph. To accelerate and stabilize the training of reinforcement learning-based algorithms, the framework also employs imitation learning and curriculum learning. The first one enables the agent to learn from demonstrations, while the second one structures the training process by gradually increasing task complexity from simple to more advanced scenarios. Numerical experiments conducted in the Isaac Sim environment showed that using a 3D scene graph for reinforcement learning significantly increased the success rate in difficult navigation cases. The code is open-sourced and available at: this https URL
Ferrimagnets containing several partially compensated magnetic sublattices are considered the most promising materials for all-optical data storage and for ultrafast communications based on spin waves. There are two magnetic phases of the ferrimagnets: collinear and non-collinear ones. Up to now spin dynamics in ferrimagnets has been studied mostly in the collinear state without paying much attention to the kind of the magnetic phase. Here we investigate laser induced ultrafast spin dynamics in a rare-earth iron garnet film in the noncollinear phase as well. We identify a crucial influence of the magnetic phase on the excited spin modes which allowed us to discover several prominent effects previously overlooked. In particular, the non-collinearity makes the quasi-antiferromagnetic mode sensitive to the external magnetic field and brings its frequency close to the frequency of the quasiferromagnetic mode. The latter maximizes near the magnetization compensation point and vanishes towards the collinear phase. Spectacularly, at the phase transition the quasiferromagnetic mode becomes soft and its amplitude significantly increases reaching 7°. This opens new opportunities for the ultrafast control of spins in ferrimagnets for nonthermal data storage and data processing.
We present the most extensive sample of 45 type I (short) and 275 type II (long) gamma-ray bursts (GRB) with known redshift to investigate the correlation between the rest frame peak energy, Ep,i and the total isotropic equivalent energy, Eiso of the prompt emission (Amati relation). The Ep,i - Eiso correlation for type I bursts is found to be well-distinguished from the one constructed for type II bursts and has a similar power-law index value, a = 0.4, which possibly indicates the same emission mechanism of both GRB types. We show that the initial pulse complex (IPC) of type I bursts with an extended emission and regular type I bursts follow the same correlation. We obtain similar results for type II bursts associated with Ic supernovae and for regular type II bursts. Three possible outliers from the Ep,i - Eiso correlation for type II subsample are detected. Significant evolution of the Ep,i - Eiso correlation with redshift for type II bursts is not found. We suggest the new classification method, based on the Ep,i - Eiso correlation and introduce two parameters, EH and EHD. EHD is found to be the most reliable parameter for the blind type I - type II classification, which can be used to classify GRBs with no redshift.
The exact time-dependent solution is obtained for a magnetic field growth during a spherically symmetric accretion into a black hole (BH) with a Schwarzschild metric. Magnetic field is increasing with time, changing from the initially uniform into a quasi-radial field. Equipartition between magnetic and kinetic energies in the falling gas is established in the developed stages of the flow. Estimates of the synchrotron radiation intensity are presented for the stationary flow. The main part of the radiation is formed in the region $r \leq 7 r_g,here, here r_g$ is a BH gravitational radius. The two-dimensional stationary self-similar magnetohydrodynamic solution is obtained for the matter accretion into BH, in a presence of a large-scale magnetic field, when the magnetic field far from the BH is homogeneous and does not influence the flow. At the symmetry plane perpendicular to the direction of the distant magnetic field, the quasi-stationary disk is formed around BH, which structure is determined by dissipation processes. Parameters of the shock forming due to matter infall onto the disk are obtained. The radiation spectrum of the disk and the shock are obtained for the 10M10\,\, M_\odot BH. The luminosity of such object is about the solar one, for a characteristic galactic gas density, with possibility of observation at distances less than 1 kpc. The spectra of a laminar and a turbulent disk structure around BH are very different. The turbulent disk emits a large part of its flux in the infrared. It may occur that some of the galactic infrared star-like sources are a single BH in the turbulent accretion state. The radiative efficiency of the magnetized disk is very high, reaching 0.5M˙c2\sim 0.5\,\dot M\,c^2 so it was called recently as a magnetically arrested disk (MAD). Numerical simulations of MAD, and its appearance during accretion into neutron stars are considered and discussed.
Mapping is one of the crucial tasks enabling autonomous navigation of a mobile robot. Conventional mapping methods output a dense geometric map representation, e.g. an occupancy grid, which is not trivial to keep consistent for prolonged runs covering large environments. Meanwhile, capturing the topological structure of the workspace enables fast path planning, is typically less prone to odometry error accumulation, and does not consume much memory. Following this idea, this paper introduces PRISM-TopoMap -- a topological mapping method that maintains a graph of locally aligned locations not relying on global metric coordinates. The proposed method involves original learnable multimodal place recognition paired with the scan matching pipeline for localization and loop closure in the graph of locations. The latter is updated online, and the robot is localized in a proper node at each time step. We conduct a broad experimental evaluation of the suggested approach in a range of photo-realistic environments and on a real robot, and compare it to state of the art. The results of the empirical evaluation confirm that PRISM-Topomap consistently outperforms competitors computationally-wise, achieves high mapping quality and performs well on a real robot. The code of PRISM-Topomap is open-sourced and is available at: this https URL
18
We investigate the internal structure of the pion using generalized transverse momentum-dependent parton distributions (GTMDs) within the light-cone quark model. By solving the quark-quark correlator, we derive the twist-22, 33, and 44 quark GTMDs in terms of light-front wave functions (LFWFs). Out of the 1616 possible GTMDs, 1212 are found to be nonzero. Furthermore, we extract the valence quark transverse momentum-dependent parton distributions (TMDs) and generalized parton distributions (GPDs) from their corresponding GTMDs. Additionally, we compute the valence quark electromagnetic form factors (FFs) and parton distribution functions (PDFs) up to twist-44. The elastic charge radius of the pion is determined to be 0.5580.558 fm. Our results exhibit a qualitative agreement with predictions from other theoretical model like Nambu-Jona-Lasinio model, Light-front holographic model, and spectator model at the leading twist. This study provides a comprehensive insight into the internal structure of the pion.
Virtually all federated learning (FL) methods, including FedAvg, operate in the following manner: i) an orchestrating server sends the current model parameters to a cohort of clients selected via certain rule, ii) these clients then independently perform a local training procedure (e.g., via SGD or Adam) using their own training data, and iii) the resulting models are shipped to the server for aggregation. This process is repeated until a model of suitable quality is found. A notable feature of these methods is that each cohort is involved in a single communication round with the server only. In this work we challenge this algorithmic design primitive and investigate whether it is possible to ``squeeze more juice" out of each cohort than what is possible in a single communication round. Surprisingly, we find that this is indeed the case, and our approach leads to up to 74% reduction in the total communication cost needed to train a FL model in the cross-device setting. Our method is based on a novel variant of the stochastic proximal point method (SPPM-AS) which supports a large collection of client sampling procedures some of which lead to further gains when compared to classical client selection approaches.
The paper explores combinatorial properties of Fibonacci words and their generalizations within the framework of combinatorics on words. These infinite sequences, measures the diversity of subwords in Fibonacci words, showing non-decreasing growth for infinite sequences. Extends factor analysis to arithmetic progressions of symbols, highlighting generalized pattern distributions. Recent results link Sturmian sequences (including Fibonacci words) to unbounded binomial complexity and gap inequivalence, with implications for formal language theory and automata. This work underscores the interplay between substitution rules, algebraic number theory, and combinatorial complexity in infinite words, providing tools for applications in fractal geometry and theoretical computer science.
We propose a methodology for implementing Grover's algorithm in the digital quantum simulation of disordered Ising models. The core concept revolves around using the evolution operator for the Ising model as the quantum oracle within Grover's search. This operator induces phase shifts for the eigenstates of the Ising Hamiltonian, with the most pronounced shifts occurring for the lowest and highest energy states. Determining these states for a disordered Ising Hamiltonian using classical methods presents an exponentially complex challenge with respect to the number of spins (or qubits) involved. Within our proposed approach, we determine the optimal evolution time by ensuring a phase flip for the target states. This method yields a quadratic speedup compared to classical computation methods and enables the identification of the lowest and highest energy states (or neighboring states) with a high probability 1\lesssim 1.
Dualities between quantum field theories have proven to be a powerful tool in various areas of physics. In this paper, we introduce a new perspective for obtaining strong coupling expansions based on a well-known technique -- the Field-space Fourier transform. We discuss the advantages of this approach for a class of Euclidean quantum field theories on a general lattice, with a focus on a self-interacting ϕ4\phi^4 scalar field theory defined on a cubic lattice of arbitrary dimension. We establish a duality between the strong coupling regime of this theory and the weak coupling regime of a corresponding dual theory. Without loss of generality, we choose the original theory to be local and show that its dual action becomes nonlocal. Using standard diagrammatic techniques, we derive expansions for the two-point correlator and the free energy per site in the regime of large and intermediate coupling constants gg. The obtained expansions remain regular in the limit g0g \to 0 and exhibit rapid numerical convergence in the considered regions. Numerical analysis in dimensions d=2d = 2 and d=3d = 3 demonstrates good agreement between our analytical results and Monte Carlo simulations. Furthermore, we show that the strong coupling expansions are consistent with traditional weak coupling expansions.
Phase estimation is known to be a robust method for single-qubit gate calibration in quantum computers, while Bayesian estimation is widely used in devising optimal methods for learning in quantum systems. We present Bayesian phase estimation methods that adaptively choose a control phase and the time of coherent evolution based on prior phase knowledge. In the presence of noise, we find near-optimal performance with respect to known theoretical bounds, and demonstrate some robustness of the estimates to noise that is not accounted for in the model of the estimator, making the methods suitable for calibrating operations in quantum computers. We determine the utility of control parameter values using functions of the prior probability of the phase that quantify expected knowledge gain either in terms of expected narrowing of the posterior or expected information gain. In particular, we find that by maximising the rate of expected gain we obtain phase estimates having standard deviation a factor of 1.43 larger than the Heisenberg limit using a classical sequential strategy. The methods provide optimal solutions accounting for available prior knowledge and experimental imperfections with minimal effort from the user. The effect of many types of noise can be specified in the model of the measurement probabilities, and the rate of knowledge gain can easily be adjusted to account for times included in the measurement sequence other than the coherent evolution leading to the unknown phase, such as times required for state preparation or readout.
·
Methods with adaptive scaling of different features play a key role in solving saddle point problems, primarily due to Adam's popularity for solving adversarial machine learning problems, including GANS training. This paper carries out a theoretical analysis of the following scaling techniques for solving SPPs: the well-known Adam and RmsProp scaling and the newer AdaHessian and OASIS based on Hutchison approximation. We use the Extra Gradient and its improved version with negative momentum as the basic method. Experimental studies on GANs show good applicability not only for Adam, but also for other less popular methods.
24 Mar 2025
We present novel type of tunable magneto-optical metasurfaces performing Faraday rotation, the sign and value of which are not fixed after the structure fabrication but can be tuned in a wide range via heating of the metasurface. We demonstrate both experimentally and theoretically that the Faraday rotation angle is enhanced in the vicinity of the magnetodipole and electrodipole Mie resonances and can be changed in a wide range from -0.3 degrees to +0.1 degrees for the same metasurface at fixed wavelength of incident light under the temperature changes from the 294~K to 488~K. Such thermal heating can be performed by an external control laser. As laser radiation can be focused at the spots of  1μ~\sim 1\mum diameter, the magneto-optical response can also be tuned locally. Thus one may obtain the inhomogeneous magnetooptically-induced polarization rotation distributions across the metasurface by the creation of the laser beam patterns with the desired intensity profiles. Another possibility opened by the proposed merasurface is self-modulation of polarization of laser light performed depending on its intensity.
A brief illustrative discussion of the shadows of black holes at local and cosmological distances is presented. Starting from definition of the term and discussion of recent observations, we then investigate shadows at large, cosmological distances. On a cosmological scale, the size of shadow observed by comoving observer is expected to be affected by cosmic expansion. Exact analytical solution for the shadow angular size of Schwarzschild black hole in de Sitter universe was found. Additionally, an approximate method was presented, based on using angular size redshift relation. This approach is appropriate for general case of any multicomponent universe (with matter, radiation and dark energy). It was shown, that supermassive black holes at cosmological distances in universe with matter may give the shadow size comparable with the shadow size in M87, and in the center of our Galaxy.
Despite the fact that popular text-to-image generation models cope well with international and general cultural queries, they have a significant knowledge gap regarding individual cultures. This is due to the content of existing large training datasets collected on the Internet, which are predominantly based on Western European or American popular culture. Meanwhile, the lack of cultural adaptation of the model can lead to incorrect results, a decrease in the generation quality, and the spread of stereotypes and offensive content. In an effort to address this issue, we examine the concept of cultural code and recognize the critical importance of its understanding by modern image generation models, an issue that has not been sufficiently addressed in the research community to date. We propose the methodology for collecting and processing the data necessary to form a dataset based on the cultural code, in particular the Russian one. We explore how the collected data affects the quality of generations in the national domain and analyze the effectiveness of our approach using the Kandinsky 3.1 text-to-image model. Human evaluation results demonstrate an increase in the level of awareness of Russian culture in the model.
We present the IGLU Gridworld: a reinforcement learning environment for building and evaluating language conditioned embodied agents in a scalable way. The environment features visual agent embodiment, interactive learning through collaboration, language conditioned RL, and combinatorically hard task (3d blocks building) space.
The StochAstic Recursive grAdient algoritHm (SARAH) algorithm is a variance reduced variant of the Stochastic Gradient Descent (SGD) algorithm that needs a gradient of the objective function from time to time. In this paper, we remove the necessity of a full gradient computation. This is achieved by using a randomized reshuffling strategy and aggregating stochastic gradients obtained in each epoch. The aggregated stochastic gradients serve as an estimate of a full gradient in the SARAH algorithm. We provide a theoretical analysis of the proposed approach and conclude the paper with numerical experiments that demonstrate the efficiency of this approach.
There are no more papers matching your filters at the moment.