Stochastic Weight Averaging (SWA) is a simple optimization technique that improves the generalization performance of deep neural networks by finding wider optima. It consistently outperforms conventional SGD across various architectures and datasets, achieving ensemble-like accuracy with a single model.
View blogResearchers from HSE University, Constructor University, University of Amsterdam, and SberDevices propose TEncDM, a Text Encoding Diffusion Model that leverages pre-trained language model encodings as a rich latent space for non-autoregressive text generation. The model achieves state-of-the-art results among non-autoregressive diffusion models, significantly outperforming prior embedding-based approaches, and demonstrates competitive performance with strong autoregressive baselines on conditional generation tasks such as paraphrasing, summarization, and text simplification.
View blogResearchers empirically demonstrate that high-accuracy local optima in deep neural networks are connected by low-loss paths, challenging the notion of isolated optima. Leveraging this insight, they introduce Fast Geometric Ensembling (FGE), an efficient method that outperforms state-of-the-art ensembling techniques within a single model's training budget across various architectures and datasets like CIFAR-100 and ImageNet.
View blogThis paper provides a historiographical review of two decades of algorithmic advancements in Feynman integral reduction, focusing on the development of computer codes that implement Integration-by-Parts relations. It highlights how competitive innovation and engineering efforts have yielded increasingly efficient and scalable software crucial for high-precision calculations in perturbative Quantum Field Theory.
View blog