alphaXiv

Universidad de Buenos Aires (UBA)

21 May 2024

EnCodecMAE: Leveraging neural codecs for universal audio representation learning

CONICET Universidad de Buenos Aires (UBA)Instituto de Investigacion en Ciencias de la Computacion (ICC)

The goal of universal audio representation learning is to obtain foundational models that can be used for a variety of downstream tasks involving speech, music and environmental sounds. To approach this problem, methods inspired by works on self-supervised learning for NLP, like BERT, or computer vision, like masked autoencoders (MAE), are often adapted to the audio domain. In this work, we propose masking representations of the audio signal, and training a MAE to reconstruct the masked segments. The reconstruction is done by predicting the discrete units generated by EnCodec, a neural audio codec, from the unmasked inputs. We evaluate this approach, which we call EnCodecMAE, on a wide range of tasks involving speech, music and environmental sounds. Our best model outperforms various state-of-the-art audio representation models in terms of global performance. Additionally, we evaluate the resulting representations in the challenging task of automatic speech recognition (ASR), obtaining decent results and paving the way for a universal audio representation.

24 Feb 2023

clustering-algorithms computer-science computation-and-language

Phone and speaker spatial organization in self-supervised speech representations

Universidad de Buenos Aires (UBA)Instituto de Investigaci ́on en Ciencias de la Computaci ́on (ICC), CONICET-UBA

Self-supervised representations of speech are currently being widely used for a large number of applications. Recently, some efforts have been made in trying to analyze the type of information present in each of these representations. Most such work uses downstream models to test whether the representations can be successfully used for a specific task. The downstream models, though, typically perform nonlinear operations on the representation extracting information that may not have been readily available in the original representation. In this work, we analyze the spatial organization of phone and speaker information in several state-of-the-art speech representations using methods that do not require a downstream model. We measure how different layers encode basic acoustic parameters such as formants and pitch using representation similarity analysis. Further, we study the extent to which each representation clusters the speech samples by phone or speaker classes using non-parametric statistical testing. Our results indicate that models represent these speech attributes differently depending on the target task used during pretraining.

14 Jul 2021

high-energy-physics-theory physics

Supersymmetry, T-duality and Heterotic $α'$ -corrections

Universidad de Buenos Aires (UBA)Instituto de Astronom ́ıa y F ́ısica del Espacio - IAFE (CONICET-UBA)

Higher-derivative interactions and transformation rules of the fields in the effective field theories of the massless string states are strongly constrained by space-time symmetries and dualities. Here we use an exact formulation of ten dimensional

{\cal N}=1

supergravity coupled to Yang-Mills with manifest T-duality symmetry to construct the first order

\alpha'

-corrections of the heterotic string effective action. The theory contains a supersymmetric and T-duality covariant generalization of the Green-Schwarz mechanism that determines the modifications to the leading order supersymmetry transformation rules of the fields. We compute the resulting field-dependent deformations of the coefficients in the supersymmetry algebra and construct the invariant action, with up to and including four-derivative terms of all the massless bosonic and fermionic fields of the heterotic string spectrum.

22 Apr 2025

analysis-of-pdes mathematics optimization-and-control

A function space approach to the shape optimization of the Boussinesq system

University of Georgia George Mason University Universidad de Buenos Aires (UBA)

We investigate a shape optimization problem for a heat-conducting fluid governed by a Boussinesq system. The main goal is to determine an optimal domain shape that yields a temperature distribution as uniform as possible. Initially, we analyze the state problem, prove its well-posedness and establish a local boundary regularity result for the weak solution. We then demonstrate the existence of an optimal shape and derive a first-order optimality condition. This requires the derivation and analysis of the adjoint system associated with the Boussinesq model, as well as a rigorous treatment of the directional derivatives of the objective functional under appropriate domain perturbations. Finally, we present numerical experiments that illustrate and support the theoretical findings.

23 Aug 2022

high-energy-physics-theory physics

Symmetry Enhancements in 7d Heterotic Strings

Institut de physique th ́eorique, Universit ́e Paris-Saclay, CEA, CNRS Universidad de Buenos Aires (UBA)Instituto de Astronom ́ıa y F ́ısica del Espacio - IAFE (CONICET-UBA)

We use a moduli space exploration algorithm to produce a complete list of maximally enhanced gauge groups that are realized in the heterotic string in 7d, encompassing the usual Narain component, and five other components with rank reduction realized via nontrivial holonomy triples. Using lattice embedding techniques we find an explicit match with the mechanism of singularity freezing in M-theory on K3. The complete global data for each gauge group is explicitly given.

16 Mar 2022

astrophysics-of-galaxies physics

Cold and Hot gas distribution around the Milky-Way-M31 system in the HESTIA simulations

CNRS

MIT ENS de Lyon University of Tartu Max-Planck-Institut für Astrophysik McMaster University Estonian Academy of Sciences Hebrew University Instituto de Astrofísica de Canarias Universidad de La Laguna Universität Potsdam Leibniz-Institut für Astrophysik Potsdam (AIP)Universidad de Buenos Aires (UBA)Centre de Recherche Astrophysique de Lyon Univ Lyon1 Instituto de Astronomía y Física del Espacio (IAFE, CONICET-UBA)

Recent observations have revealed remarkable insights into the gas reservoir in the circumgalactic medium (CGM) of galaxy haloes. In this paper, we characterise the gas in the vicinity of Milky Way and Andromeda analogues in the HESTIA (High resolution Environmental Simulations of The Immediate Area) suite of constrained Local Group (LG) simulations. The HESTIA suite comprise of a set of three high-resolution {\sc arepo}-based simulations of the LG, run using the Auriga galaxy formation model. For this paper, we focus only on the

z = 0

simulation datasets and generate mock skymaps along with a power spectrum analysis to show that the distributions of ions tracing low-temperature gas (HI and SiIII) are more clumpy in comparison to warmer gas tracers (OVI, OVII and OVIII). We compare to the spectroscopic CGM observations of M31 and low-redshift galaxies. HESTIA under-produces the column densities of the M31 observations, but the simulations are consistent with the observations of low-redshift galaxies. A possible explanation for these findings is that the spectroscopic observations of M31 are contaminated by gas residing in the CGM of the Milky Way.

30 Jul 2023

computer-science computation-and-language sound

Mispronunciation detection using self-supervised speech representations

CONICET Universidad de Buenos Aires (UBA)

In recent years, self-supervised learning (SSL) models have produced promising results in a variety of speech-processing tasks, especially in contexts of data scarcity. In this paper, we study the use of SSL models for the task of mispronunciation detection for second language learners. We compare two downstream approaches: 1) training the model for phone recognition (PR) using native English data, and 2) training a model directly for the target task using non-native English data. We compare the performance of these two approaches for various SSL representations as well as a representation extracted from a traditional DNN-based speech recognition model. We evaluate the models on L2Arctic and EpaDB, two datasets of non-native speech annotated with pronunciation labels at the phone level. Overall, we find that using a downstream model trained for the target task gives the best performance and that most upstream models perform similarly for the task.

07 Feb 2025

computer-science computation-and-language information-extraction

Indigenous Languages Spoken in Argentina: A Survey of NLP and Speech Resources

CONICET George Mason University Universidad de Buenos Aires (UBA)

Argentina has a large yet little-known Indigenous linguistic diversity, encompassing at least 40 different languages. The majority of these languages are at risk of disappearing, resulting in a significant loss of world heritage and cultural knowledge. Currently, unified information on speakers and computational tools is lacking for these languages. In this work, we present a systematization of the Indigenous languages spoken in Argentina, classifying them into seven language families: Mapuche, Tup\'i-Guaran\'i, Guaycur\'u, Quechua, Mataco-Mataguaya, Aymara, and Chon. For each one, we present an estimation of the national Indigenous population size, based on the most recent Argentinian census. We discuss potential reasons why the census questionnaire design may underestimate the actual number of speakers. We also provide a concise survey of computational resources available for these languages, whether or not they were specifically developed for Argentinian varieties.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

EnCodecMAE: Leveraging neural codecs for universal audio representation learning

Phone and speaker spatial organization in self-supervised speech representations

Supersymmetry, T-duality and Heterotic $α'$ -corrections

A function space approach to the shape optimization of the Boussinesq system

Symmetry Enhancements in 7d Heterotic Strings

Cold and Hot gas distribution around the Milky-Way-M31 system in the HESTIA simulations

Mispronunciation detection using self-supervised speech representations

Indigenous Languages Spoken in Argentina: A Survey of NLP and Speech Resources

Events

AI for Law

Personalize Your Feed

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

EnCodecMAE: Leveraging neural codecs for universal audio representation learning

Phone and speaker spatial organization in self-supervised speech representations

Supersymmetry, T-duality and Heterotic α′α'α′-corrections

A function space approach to the shape optimization of the Boussinesq system

Symmetry Enhancements in 7d Heterotic Strings

Cold and Hot gas distribution around the Milky-Way-M31 system in the HESTIA simulations

Mispronunciation detection using self-supervised speech representations

Indigenous Languages Spoken in Argentina: A Survey of NLP and Speech Resources

Events

AI for Law

Personalize Your Feed

Supersymmetry, T-duality and Heterotic $α'$ -corrections