alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Ask or search anything...

Events

Watch Recordings

AI for Law01/09 · Joel Niklaus · Hugging Face

Papers Benchmarks

INPMoscow State University

Averaging Weights Leads to Wider Optima and Better Generalization

25 Feb 2019

Cornell University Higher School of Economics

Stochastic Weight Averaging (SWA) is a simple optimization technique that improves the generalization performance of deep neural networks by finding wider optima. It consistently outperforms conventional SGD across various architectures and datasets, achieving ensemble-like accuracy with a single model.

#computer-science #artificial-intelligence #computer-vision-and-pattern-recognition

Paper thumbnail

TEncDM: Understanding the Properties of the Diffusion Model in the Space of Language Model Encodings

24 Feb 2025

egor-chimbulatov

Egor Chimbulatov

University of Amsterdam HSE University

Researchers from HSE University, Constructor University, University of Amsterdam, and SberDevices propose TEncDM, a Text Encoding Diffusion Model that leverages pre-trained language model encodings as a rich latent space for non-autoregressive text generation. The model achieves state-of-the-art results among non-autoregressive diffusion models, significantly outperforming prior embedding-based approaches, and demonstrates competitive performance with strong autoregressive baselines on conditional generation tasks such as paraphrasing, summarization, and text simplification.

#computer-science #computation-and-language #generative-models

Paper thumbnail

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs

30 Oct 2018

Cornell University Higher School of Economics

Researchers empirically demonstrate that high-accuracy local optima in deep neural networks are connected by low-loss paths, challenging the notion of isolated optima. Leveraging this insight, they introduce Fast Geometric Ensembling (FGE), an efficient method that outperforms state-of-the-art ensembling techniques within a single model's training budget across various architectures and datasets like CIFAR-100 and ImageNet.

#computer-science #artificial-intelligence #machine-learning

Paper thumbnail

Diffusion on language model encodings for protein sequence generation

06 Sep 2025

pavel-strashnov

Pavel Strashnov

AIRI Moscow State University

Protein sequence design has seen significant advances through discrete diffusion and autoregressive approaches, yet the potential of continuous diffusion remains underexplored. Here, we present DiMA, a latent diffusion framework that operates on protein language model representations. Through systematic exploration of architectural choices and diffusion components, we develop a robust methodology that generalizes across multiple protein encoders ranging from 8M to 3B parameters. We demonstrate that our framework achieves consistently high performance across sequence-only (ESM-2, ESMc), dual-decodable (CHEAP), and multimodal (SaProt) representations using the same architecture and training approach. We extensively evaluate existing methods alongside DiMA using multiple metrics across two protein modalities, covering quality, diversity, novelty, and distribution matching of generated proteins. DiMA consistently produces novel, high-quality and diverse protein sequences and achieves strong results compared to baselines such as autoregressive, discrete diffusion and flow matching language models. The model demonstrates versatile functionality, supporting conditional generation tasks including protein family-generation, motif scaffolding and infilling, and fold-specific sequence design. This work provides a universal continuous diffusion framework for protein sequence generation, offering both architectural insights and practical applicability across various protein design scenarios. Code is released at \href{this https URL}{GitHub}.

#computer-science #artificial-intelligence #machine-learning

Paper thumbnail

Enhancing Stability of Physics-Informed Neural Network Training Through Saddle-Point Reformulation

27 Sep 2025

Moscow Institute of Physics and Technology Innopolis University

Physics-informed neural networks (PINNs) have gained prominence in recent years and are now effectively used in a number of applications. However, their performance remains unstable due to the complex landscape of the loss function. To address this issue, we reformulate PINN training as a nonconvex-strongly concave saddle-point problem. After establishing the theoretical foundation for this approach, we conduct an extensive experimental study, evaluating its effectiveness across various tasks and architectures. Our results demonstrate that the proposed method outperforms the current state-of-the-art techniques.

#computer-science #machine-learning #mathematics

Paper thumbnail

AutoIntent: AutoML for Text Classification

25 Sep 2025

ITMO University Moscow State University

AutoIntent is an automated machine learning tool for text classification tasks. Unlike existing solutions, AutoIntent offers end-to-end automation with embedding model selection, classifier optimization, and decision threshold tuning, all within a modular, sklearn-like interface. The framework is designed to support multi-label classification and out-of-scope detection. AutoIntent demonstrates superior performance compared to existing AutoML tools on standard intent classification datasets and enables users to balance effectiveness and resource consumption.

#computer-science #computation-and-language

Paper thumbnail

A Cosmic Void Catalog of SDSS DR12 BOSS Galaxies

18 Feb 2016

Vanderbilt University New York University logo

New York University

We present a cosmic void catalog using the large-scale structure galaxy catalog from the Baryon Oscillation Spectroscopic Survey (BOSS). This galaxy catalog is part of the Sloan Digital Sky Survey (SDSS) Data Release 12 and is the final catalog of SDSS-III. We take into account the survey boundaries, masks, and angular and radial selection functions, and apply the ZOBOV void finding algorithm to the galaxy catalog. We identify a total of 10,643 voids. After making quality cuts to ensure that the voids represent real underdense regions, we obtain 1,228 voids with effective radii spanning the range 20-100Mpc/h and with central densities that are, on average, 30% of the mean sample density. We release versions of the catalogs both with and without quality cuts. We discuss the basic statistics of voids, such as their size and redshift distributions, and measure the radial density profile of the voids via a stacking technique. In addition, we construct mock void catalogs from 1000 mock galaxy catalogs, and find that the properties of BOSS voids are in good agreement with those in the mock catalogs. We compare the stellar mass distribution of galaxies living inside and outside of the voids, and find no significant difference. These BOSS and mock void catalogs are useful for a number of cosmological and galaxy environment studies.

#cosmology-and-nongalactic-astrophysics #physics

Paper thumbnail

Two decades of algorithmic Feynman integral reduction

12 Oct 2025

University of Edinburgh Moscow State University

This paper provides a historiographical review of two decades of algorithmic advancements in Feynman integral reduction, focusing on the development of computer codes that implement Integration-by-Parts relations. It highlights how competitive innovation and engineering efforts have yielded increasingly efficient and scalable software crucial for high-precision calculations in perturbative Quantum Field Theory.

#high-energy-physics-phenomenology #high-energy-physics-theory #physics

Paper thumbnail

Closing the Curvature Gap: Full Transformer Hessians and Their Implications for Scaling Laws

19 Oct 2025

Yandex Moscow State University

The lack of theoretical results for Layer Normalization and feedforward Hessians has left a gap in the study of Transformer optimization landscapes. We address this by deriving explicit second-order expressions for these components, thereby completing the Hessian characterization of full Transformer blocks. Our results generalize prior self-attention analyses and yield estimations for the role of each sublayer in curvature propagation. We demonstrate how these Hessian structures inform both convergence dynamics and the empirical scaling laws governing large-model performance. Further, we propose a Taylor-expansion-based framework for analyzing loss differences to quantify convergence trajectories. By extending Hessian theory to the full Transformer architecture, this work establishes a new foundation for theoretical and empirical investigations of optimization in large-scale deep learning.

#attention-mechanisms #computer-science #machine-learning

Paper thumbnail

Second-order self-force potential-region binary dynamics at

O(G^5)

in supergravity

22 Sep 2025

ETH Zurich University of Edinburgh

We compute the potential-graviton contributions to the conservative scattering angle of two non-spinning bodies in maximal supergravity at fifth order in Newton's constant, including second-order self-force effects. Our goal is to tackle the challenging integrals arising at this order in Einstein gravity, but within the technically simpler framework of supergravity. The calculation employs the scattering-amplitude framework, effective field theory, and multi-loop integration techniques based on integration by parts and differential equations. The final result is expressed as a series expansion around the static limit, thereby avoiding the explicit evaluation of intricate special functions. This series solution for the master integrals applies, as well, to the corresponding computation in general relativity. Remarkably, we observe nontrivial cancellations among contributions associated with Calabi-Yau integrals, alongside a distinct contribution governed by a Heun differential equation.

#high-energy-physics-theory #physics

Paper thumbnail

The massive binary system WR 20a: light curve analysis in a colliding wind model

19 Oct 2025

Moscow State University Sternberg Astronomical Institute

The article presents the results of the analysis of optical light curves of the massive binary system WR 20a (WN 6ha + WN 6ha). The analysis was performed with the binary system model, extending the standard Roche model for the case when both components of the system have powerful stellar winds. The model takes into account the collision of the winds and the influence of orbital motion on the collision zone. The observational light curves in the BVI filters were taken from previously published papers, in which they were analyzed using the standard Roche model. The main difference between the results of our work and the previous results is that in our model the radii of the components are about 25% smaller. As a consequence, the luminosity of the system in our model decreased by approximately 40%, and the distance to the system by 20%. In addition, the model was able to successfully describe the observed asymmetry of the light curve with respect to the phases of the conjunctions, which is impossible in the standard Roche model. The model light curves were also compared with the observational curves obtained by the TESS satellite and the ASAS-SN project. It was shown that, taking into account recent studies of interstellar extinction in the direction of the young open cluster Westerlund 2, the distance to WR 20a obtained in our calculations is consistent with the hypothesis that WR 20a is a member of the cluster.

#solar-and-stellar-astrophysics #physics

Paper thumbnail

LLavaCode: Compressed Code Representations for Retrieval-Augmented Code Generation

22 Oct 2025

HSE University Moscow State University

Retrieval-augmented generation has emerged as one of the most effective approaches for code completion, particularly when context from a surrounding repository is essential. However, incorporating context significantly extends sequence length, leading to slower inference - a critical limitation for interactive settings such as IDEs. In this work, we introduce LlavaCode, a framework that compresses code into compact, semantically rich representations interpretable by code LLM, enhancing generation quality while reducing the retrieved context to only a few compressed single-token vectors. Using a small projector module we can significantly increase the EM and ES metrics of coding model with negligible latency increase. Our experiments demonstrate that compressed context enables 20-38% reduction in Time-to-First-Token (TTFT) on line completion tasks compared to full-RAG pipelines.

#computer-science #computation-and-language #embedding-methods

Paper thumbnail

O(4,4) dualities and Manin triples

23 Sep 2025

Joint Institute for Nuclear Research Moscow Institute of Physics and Technology

We provide a coarse classification of all 8-dimensional Manin triples, that describe Poisson--Lie T-dualities between 4-dimensional group manifold solutions to supergravity equations. We find several such dualities and one Poisson--Lie triality.

#high-energy-physics-theory #physics

Paper thumbnail

Asymptotically Good Quantum and Locally Testable Classical LDPC Codes

21 Jan 2022

Moscow State University

We study classical and quantum LDPC codes of constant rate obtained by the lifted product construction over non-abelian groups. We show that the obtained families of quantum LDPC codes are asymptotically good, which proves the qLDPC conjecture. Moreover, we show that the produced classical LDPC codes are also asymptotically good and locally testable with constant query and soundness parameters, which proves a well-known conjecture in the field of locally testable codes.

#computer-science #information-theory #physics

Paper thumbnail

Five W-boson amplitude = near-null decagon

18 Oct 2025

Joint Institute for Nuclear Research Arizona State University logo

Arizona State University

We study a five-leg scattering amplitude on the special Coulomb branch of planar N=4 super Yang-Mills theory. We reach this point of the moduli space of scalar vacuum expectation values by considering six-dimensional N=(1,1) super Yang-Mills theory and reducing it down to four space-time dimensions with extra-dimensional momenta being nonvanishing. This branch is characterized by massive external W-bosons and massless internal gluons propagating in loops. We analyze the five W-boson amplitude in the kinematics when their masses are much smaller than all Mandelstam-like invariants. This is what we dub the near mass-shell limit. We perform calculations to two-loop order in 't Hooft coupling, making use of recent advances in analytic calculations of required Feynman integrals. Our findings confirm exponentiation of infrared logarithms and enable us to conjecture a concise all-order expression for the amplitude in question. We further analyze its duality to the `square root' of a five-point correlation function of infinitely-heavy half-BPS operators, known as the decagon. By considering the near-null limit for inter-operators distances, we verify that the two objects coincide. This observation corroborates the novel Coulomb amplitudes/heavy correlator duality previously observed for four W-boson amplitudes and Sudakov form factors.

#high-energy-physics-theory #physics

Paper thumbnail

Extreme black points in Born-Infeld electrodynamics

30 Aug 2025

Moscow State University

The article considers the space-time structure of a charged black hole in the nonlinear Born-Infeld electrodynamics. We are discussing a special state of such a black hole in the form of a "black point" with a doubly degenerate horizon, for which the pseudo-Riemannian spacetime has a timelike singularity, and the effective space-time for photons turns out to be everywhere regular. This property makes extreme black points an intermediate state between traditional and absolutely regular black holes.

#general-relativity-and-quantum-cosmology #physics

Paper thumbnail

AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results

22 Oct 2024

dmitriy-vatolin

Dmitriy Vatolin

Moscow State University University of Würzburg

Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dataset of 459 videos, encoded with 14 codecs of various compression standards (AVC/H.264, HEVC/H.265, AV1, and VVC/H.266) and containing a comprehensive collection of compression artifacts. To measure the methods performance, we employed traditional correlation coefficients between their predictions and subjective scores, which were collected via large-scale crowdsourced pairwise human comparisons. For training purposes, participants were provided with the Compressed Video Quality Assessment Dataset (CVQAD), a previously developed dataset of 1022 videos. Up to 30 participating teams registered for the challenge, while we report the results of 6 teams, which submitted valid final solutions and code for reproducing the results. Moreover, we calculated and present the performance of state-of-the-art VQA methods on the developed dataset, providing a comprehensive benchmark for future research. The dataset, results, and online leaderboard are publicly available at this https URL.

#computer-science #computer-vision-security #computer-vision-and-pattern-recognition

Paper thumbnail

Maximally Extendable Product Codes are Good Coboundary Expanders

02 Jan 2025

Moscow State University

We investigate the coboundary expansion property of product codes called product expansion, which plays an important role in the recent constructions of good quantum LDPC codes and classical locally testable codes. Prior research revealed that this property is equivalent to agreement testability and robust testability for products of two codes of linear distance. However, for products of more than two codes, product expansion is a strictly stronger property. In this paper, we prove that the collection of random codes over a sufficiently large field has good product expansion. We believe that in the case of four codes, these ideas can be used to construct good quantum locally testable codes in a way similar to the current constructions using only products of two codes.

#computer-science #information-theory #physics

Paper thumbnail

EPFL Lectures on General Relativity as a Quantum Field Theory

01 Feb 2017

EPFL University of Massachusetts

These notes are an introduction to General Relativity as a Quantum Effective Field Theory, following the material given in a short course on the subject at EPFL. The intent is to develop General Relativity starting from a quantum field theoretic viewpoint, and to introduce some of the techniques needed to understand the subject.

#general-relativity-and-quantum-cosmology #high-energy-physics-phenomenology #high-energy-physics-theory

Paper thumbnail

Drinfeld-Sokolov reduction in quantum algebras

04 Oct 2017

Moscow State University National Research University “Higher School of Economics”

Applying the method of the paper [CT], we perform a quantum version of the Drinfeld-Sokolov reduction in Reflection Equation algebras and braided Yangians, associated with involutive and Hecke symmetries of general forms. This reduction is based on the Cayley-Hamilton identity valid for the generating matrices of these algebras.

#mathematics #quantum-algebra

Paper thumbnail

There are no more papers matching your filters at the moment.