Institut national de la recherche scientifique
Real-world speech recordings suffer from degradations such as background noise and reverberation. Speech enhancement aims to mitigate these issues by generating clean high-fidelity signals. While recent generative approaches for speech enhancement have shown promising results, they still face two major challenges: (1) content hallucination, where plausible phonemes generated differ from the original utterance; and (2) inconsistency, failing to preserve speaker's identity and paralinguistic features from the input speech. In this work, we introduce DiTSE (Diffusion Transformer for Speech Enhancement), which addresses quality issues of degraded speech in full bandwidth. Our approach employs a latent diffusion transformer model together with robust conditioning features, effectively addressing these challenges while remaining computationally efficient. Experimental results from both subjective and objective evaluations demonstrate that DiTSE achieves state-of-the-art audio quality that, for the first time, matches real studio-quality audio from the DAPS dataset. Furthermore, DiTSE significantly improves the preservation of speaker identity and content fidelity, reducing hallucinations across datasets compared to state-of-the-art enhancers. Audio samples are available at: this http URL
3
With the prevalence of artificial intelligence (AI)-generated content, such as audio deepfakes, a large body of recent work has focused on developing deepfake detection techniques. However, most models are evaluated on a narrow set of datasets, leaving their generalization to real-world conditions uncertain. In this paper, we systematically review 28 existing audio deepfake datasets and present an open-source benchmarking toolkit called AUDDT (this https URL). The goal of this toolkit is to automate the evaluation of pretrained detectors across these 28 datasets, giving users direct feedback on the advantages and shortcomings of their deepfake detectors. We start by showcasing the usage of the developed toolkit, the composition of our benchmark, and the breakdown of different deepfake subgroups. Next, using a widely adopted pretrained deepfake detector, we present in- and out-of-domain detection results, revealing notable differences across conditions and audio manipulation types. Lastly, we also analyze the limitations of these existing datasets and their gap relative to practical deployment scenarios.
9
Researchers from Université du Québec en Outaouais and Institut national de la recherche scientifique conducted an empirical analysis of ChatGPT's code generation from a security perspective, finding that only 24% of initially generated programs were secure. The study revealed that while ChatGPT can identify and explain vulnerabilities when prompted, it frequently generates insecure code and struggles to produce consistently secure corrections.
To this day, high-temperature cuprate superconductors remain an unparalleled platform for studying the competition and coexistence of emergent, static and dynamic, quantum phases of matter exhibiting high transition temperature non-s-wave superconductivity, non-Fermi liquid transport and a still enigmatic pseudogap regime. However, how superconductivity emerges alongside and competes with the pseudogap regime remains an open question. Here, we present a high-resolution, time- and angle-resolved photoemission study of the near-antinodal region of optimally-doped Bi2_2Sr2_2CaCu2_2O8+δ_{8+\delta}. For a sufficiently high excitation fluence, we disrupt superconductivity and drive a transient change from a symmetric superconducting-like to an asymmetric pseudogap-like density of states, for electronic temperatures well below the equilibrium superconducting critical temperature. Conversely, when the superconductivity is fully restored, the pseudogap is suppressed, as signaled by a fully particle-hole symmetric density of states. A unique aspect of our experiments is that the pseudogap coexists with superconducting features at intermediate times or at intermediate fluence. Our findings challenge the paradigm that superconductivity emerges by establishing phase coherence in the pseudogap. Instead, our experimental results, supported by phenomenological theory, demonstrate that the two states compete, and that the low-temperature ground state of the cuprates originates from a competition between superconducting and pseudogap states.
With the prevalence of artificial intelligence (AI)-generated content, such as audio deepfakes, a large body of recent work has focused on developing deepfake detection techniques. However, most models are evaluated on a narrow set of datasets, leaving their generalization to real-world conditions uncertain. In this paper, we systematically review 28 existing audio deepfake datasets and present an open-source benchmarking toolkit called AUDDT (this https URL). The goal of this toolkit is to automate the evaluation of pretrained detectors across these 28 datasets, giving users direct feedback on the advantages and shortcomings of their deepfake detectors. We start by showcasing the usage of the developed toolkit, the composition of our benchmark, and the breakdown of different deepfake subgroups. Next, using a widely adopted pretrained deepfake detector, we present in- and out-of-domain detection results, revealing notable differences across conditions and audio manipulation types. Lastly, we also analyze the limitations of these existing datasets and their gap relative to practical deployment scenarios.
Multimodal high-dimensional data are increasingly prevalent in biomedical research, yet they are often compromised by block-wise missingness and measurement errors, posing significant challenges for statistical inference and prediction. We propose AdapDISCOM, a novel adaptive direct sparse regression method that simultaneously addresses these two pervasive issues. Building on the DISCOM framework, AdapDISCOM introduces modality-specific weighting schemes to account for heterogeneity in data structures and error magnitudes across modalities. We establish the theoretical properties of AdapDISCOM, including model selection consistency and convergence rates under sub-Gaussian and heavy-tailed settings, and develop robust and computationally efficient variants (AdapDISCOM-Huber and Fast-AdapDISCOM). Extensive simulations demonstrate that AdapDISCOM consistently outperforms existing methods such as DISCOM, SCOM, and CoCoLasso, particularly under heterogeneous contamination and heavy-tailed distributions. Finally, we apply AdapDISCOM to Alzheimers Disease Neuroimaging Initiative (ADNI) data, demonstrating improved prediction of cognitive scores and reliable selection of established biomarkers, even with substantial missingness and measurement errors. AdapDISCOM provides a flexible, robust, and scalable framework for high-dimensional multimodal data analysis under realistic data imperfections.
Compressed streak imaging (CSI), introduced in 2014, has proven to be a powerful imaging technology for recording ultrafast phenomena such as light propagation and fluorescence lifetimes at over 150 trillion frames per second. Despite these achievements, CSI has faced challenges in detecting subtle intensity fluctuations in slow-moving, continuously illuminated objects. This limitation, largely attributable to high streak compression and motion blur, has curtailed broader adoption of CSI in applications such as cellular fluorescence microscopy. To address these issues and expand the utility of CSI, we present a novel encoding strategy, termed two-axis compressed streak imaging (TACSI) that results in significant improvements to the reconstructed image fidelity. TACSI introduces a second scanning axis which shuttles a conjugate image of the object with respect to the coded aperture. The moving image decreases the streak compression ratio and produces a flash and shutter phenomenon that reduces coded aperture motion blur, overcoming the limitations of current CSI technologies. We support this approach with an analytical model describing the two-axis streak compression ratio, along with both simulated and empirical measurements. As proof of concept, we demonstrate the ability of TACSI to measure rapid variations in cell membrane potentials using voltage-sensitive dye, which were previously unattainable with conventional CSI. This method has broad implications for high-speed photography, including the visualization of action potentials, muscle contractions, and enzymatic reactions that occur on microsecond and faster timescales using fluorescence microscopy.
The physics of strongly correlated materials is deeply rooted in electron interactions and their coupling to low-energy excitations. Unraveling the competing and cooperative nature of these interactions is crucial for connecting microscopic mechanisms to the emergence of exotic macroscopic behavior, such as high-temperature superconductivity. Here we show that polarization-resolved multidimensional coherent spectroscopy (MDCS) is able to selectively drive and measure coherent Raman excitations in different parts of the Fermi surface, where the superconducting gap vanishes or is the largest (respectively called Nodal and Antinodal region) in underdoped Bi-2212. Our evidence reveal that in the superconducting phase, the energy of Raman excitations in the nodal region is anti-correlated with the energy of electronic excitations at \sim1.6~eV, and both maintain coherence for over 44~fs. In contrast, excitations in the antinodal region show significantly faster decoherence (<18~fs) and no measurable correlations. Importantly, this long-lived coherence is specific to the superconducting phase and vanishes in the pseudogap and normal phases. This anti-correlation reveals a coherent link between the transition energy associated with the many body Cu-O bands and the energy of electronic Raman modes that map to the near-nodal superconducting gap. The different coherent dynamics of the nodal and antinodal excitations in the superconducting phase suggest that nodal fluctuations are protected from dissipation associated with scattering from antiferromagnetic fluctuations and may be relevant to sustaining the quantum coherent behaviour associated with high temperature superconductivity.
The near-infrared (NIR) emission of rare-earth doped nanoparticles (RENPs), known as downshifting luminescence, has been extensively investigated in diverse applications from information technology to biomedicine. In promoting brightness and enriching the functionalities of the downshifting luminescence of RENPs, numerous studies have exploited inert shell to protect rare-earth dopants from surface quenchers. However, internal concentration quenching remains an unsolved puzzle when using higher dopant concentrations of rare-earth ions in an attempt to obtain brighter emission. Following a plethora of research involving core-shell structures, the interface has shown to be controllable, ranging from a well-defined, abrupt boundary to an obscure one with cation intermixing. By utilizing this inter-mixed core-shell property for the first time, we design a new architecture to create a homogeneous double-layer core-shell interface to extend the active layer, allowing more luminescent centers without severe concentration quenching. By systematically deploying the crystallinity of the starting core, shell growth dynamics, and dopant concentrations, the downshifting luminescence intensity of new archictecture achieves a 12-fold enhancement surpassing the traditional core-shell structure. These results provide deeper insight into the potential benefits of the intermixed core-shell structure, offering an effective approach to tackling the internal concentration quenching effect for highly boosted NIR optical performance.
Phase fluctuations are widely accepted to play a primary role in the quench of the long-range superconducting order in cuprates. However, an experimental probe capable of unambiguously assessing their impact on the superconducting order parameter with momentum and time resolutions is still lacking. Here, we performed a high-resolution time- and angle-resolved photoemission study of optimally-doped Bi2_2Sr2_2CaCu2_2O8+δ_{8+\delta} and demonstrated a new experimental strategy to directly probe light-induced changes in the order parameter's phase with momentum resolution. To do this, we tracked the ultrafast response of a phase-sensitive hybridization gap that appears at the crossing between two bands with opposite superconducting gap signs. Supported by theoretical modeling, we established phase fluctuations as the dominant factor defining the non-thermal response of the unconventional superconducting phase in cuprates.
In a number of environmental studies, relationships between natural processes are often assessed through regression analyses, using time series data. Such data are often multi-scale and non-stationary, leading to a poor accuracy of the resulting regression models and therefore to results with moderate reliability. To deal with this issue, the present paper introduces the EMD-regression methodology consisting in applying the empirical mode decomposition (EMD) algorithm on data series and then using the resulting components in regression models. The proposed methodology presents a number of advantages. First, it accounts of the issues of non-stationarity associated to the data series. Second, this approach acts as a scan for the relationship between a response variable and the predictors at different time scales, providing new insights about this relationship. To illustrate the proposed methodology it is applied to study the relationship between weather and cardiovascular mortality in Montreal, Canada. The results shed new knowledge concerning the studied relationship. For instance, they show that the humidity can cause excess mortality at the monthly time scale, which is a scale not visible in classical models. A comparison is also conducted with state of the art methods which are the generalized additive models and distributed lag models, both widely used in weather-related health studies. The comparison shows that EMD-regression achieves better prediction performances and provides more details than classical models concerning the relationship.
Angle-resolved photoemission spectroscopy (ARPES) -- with its exceptional sensitivity to both the binding energy and momentum of valence electrons in solids -- provides unparalleled insights into the electronic structure of quantum materials. Over the last two decades, the advent of femtosecond lasers, which can deliver ultrashort and coherent light pulses, has ushered the ARPES technique into the time domain. Now, time-resolved ARPES (TR-ARPES) can probe ultrafast electron dynamics and the out-of-equilibrium electronic structure, providing a wealth of information otherwise unattainable in conventional ARPES experiments. This paper begins with an introduction to the theoretical underpinnings of TR-ARPES followed by a description of recent advances in state-of-the-art ultrafast sources and optical excitation schemes. It then reviews paradigmatic phenomena investigated by TR-ARPES thus far, such as out-of-equilibrium electronic states and their spin dynamics, Floquet-Volkov states, photoinduced phase transitions, electron-phonon coupling, and surface photovoltage effects. Each section highlights TR-ARPES data from diverse classes of quantum materials, including semiconductors, charge-ordered systems, topological materials, excitonic insulators, van der Waals materials, and unconventional superconductors. These examples demonstrate how TR-ARPES has played a critical role in unraveling the complex dynamical properties of quantum materials. The conclusion outlines possible future directions and opportunities for this powerful technique.
Driven by the emergence of new compute-intensive applications and the vision of the Internet of Things (IoT), it is foreseen that the emerging 5G network will face an unprecedented increase in traffic volume and computation demands. However, end users mostly have limited storage capacities and finite processing capabilities, thus how to run compute-intensive applications on resource-constrained users has recently become a natural concern. Mobile edge computing (MEC), a key technology in the emerging fifth generation (5G) network, can optimize mobile resources by hosting compute-intensive applications, process large data before sending to the cloud, provide the cloud computing capabilities within the radio access network (RAN) in close proximity to mobile users, and offer context-aware services with the help of RAN information. Therefore, MEC enables a wide variety of applications, where the real-time response is strictly required, e.g., driverless vehicles, augmented reality, robotics, and immerse media. Indeed, the paradigm shift from 4G to 5G could become a reality with the advent of new technological concepts. The successful realization of MEC in the 5G network is still in its infancy and demands for constant efforts from both academic and industry communities. In this survey, we first provide a holistic overview of MEC technology and its potential use cases and applications. Then, we outline up-to-date researches on the integration of MEC with the new technologies that will be deployed in 5G and beyond. We also summarize testbeds and experimental evaluations, and open source activities, for edge computing. We further summarize lessons learned from state-of-the-art research works as well as discuss challenges and potential future directions for MEC research.
Audio deepfake detection is crucial to combat the malicious use of AI-synthesized speech. Among many efforts undertaken by the community, the ASVspoof challenge has become one of the benchmarks to evaluate the generalizability and robustness of detection models. In this paper, we present Reality Defender's submission to the ASVspoof5 challenge, highlighting a novel pretraining strategy which significantly improves generalizability while maintaining low computational cost during training. Our system SLIM learns the style-linguistics dependency embeddings from various types of bonafide speech using self-supervised contrastive learning. The learned embeddings help to discriminate spoof from bonafide speech by focusing on the relationship between the style and linguistics aspects. We evaluated our system on ASVspoof5, ASV2019, and In-the-wild. Our submission achieved minDCF of 0.1499 and EER of 5.5% on ASVspoof5 Track 1, and EER of 7.4% and 10.8% on ASV2019 and In-the-wild respectively.
The physics of strongly correlated materials is deeply rooted in electron interactions and their coupling to low-energy excitations. Unraveling the competing and cooperative nature of these interactions is crucial for connecting microscopic mechanisms to the emergence of exotic macroscopic behavior, such as high-temperature superconductivity. Here we show that polarization-resolved multidimensional coherent spectroscopy (MDCS) is able to selectively drive and measure coherent Raman excitations in different parts of the Fermi surface, where the superconducting gap vanishes or is the largest (respectively called Nodal and Antinodal region) in underdoped Bi-2212. Our evidence reveal that in the superconducting phase, the energy of Raman excitations in the nodal region is anti-correlated with the energy of electronic excitations at \sim1.6~eV, and both maintain coherence for over 44~fs. In contrast, excitations in the antinodal region show significantly faster decoherence (<18~fs) and no measurable correlations. Importantly, this long-lived coherence is specific to the superconducting phase and vanishes in the pseudogap and normal phases. This anti-correlation reveals a coherent link between the transition energy associated with the many body Cu-O bands and the energy of electronic Raman modes that map to the near-nodal superconducting gap. The different coherent dynamics of the nodal and antinodal excitations in the superconducting phase suggest that nodal fluctuations are protected from dissipation associated with scattering from antiferromagnetic fluctuations and may be relevant to sustaining the quantum coherent behaviour associated with high temperature superconductivity.
Recent development in quantum photonics allowed to start the process of bringing photonic-quantum-based systems out of the lab into real world applications. As an example, devices for the exchange of a cryptographic key secured by the law of quantum mechanics are currently commercially available. In order to further boost this process, the next step is to migrate the results achieved by means of bulky and expensive setups to miniaturized and affordable devices. Integrated quantum photonics is exactly addressing this issue. In this paper we briefly review the most recent advancements in the generation of quantum states of light (at the core of quantum cryptography and computing) on chip. In particular, we focus on optical microcavities, as they can offer a solution to the issue of low efficiency (low number of photons generated) typical of the materials mostly used in integrated platforms. In addition, we show that specifically designed microcavities can also offer further advantages, such as compatibility with existing telecom standard (thus allowing to exploit the existing fiber network) and quantum memories (necessary in turns to extend the communication distance), as well as longitudinal multimode character. This last property (i.e. the increased dimensionality necessary for describing the quantum state of a photon) is achieved thanks to the generating multiple photon pairs on a frequency comb corresponding to the microcavity resonances. Further achievements include the possibility to fully exploit the polarization degree of freedom also for integrated devices. These results pave the way to the generation of integrated quantum frequency combs, that in turn may find application as quantum computing platform.
In recent years, Generative Adversarial Networks (GANs) have shown substantial progress in modeling complex distributions of data. These networks have received tremendous attention since they can generate implicit probabilistic models that produce realistic data using a stochastic procedure. While such models have proven highly effective in diverse scenarios, they require a large set of fully-observed training samples. In many applications access to such samples are difficult or even impractical and only noisy or partial observations of the desired distribution is available. Recent research has tried to address the problem of incompletely observed samples to recover the distribution of the data. \citep{zhu2017unpaired} and \citep{yeh2016semantic} proposed methods to solve ill-posed inverse problem using cycle-consistency and latent-space mappings in adversarial networks, respectively. \citep{bora2017compressed} and \citep{kabkab2018task} have applied similar adversarial approaches to the problem of compressed sensing. In this work, we focus on a new variant of GAN models called AmbientGAN, which incorporates a measurement process (e.g. adding noise, data removal and projection) into the GAN training. While in the standard GAN, the discriminator distinguishes a generated image from a real image, in AmbientGAN model the discriminator has to separate a real measurement from a simulated measurement of a generated image. The results shown by \citep{bora2018ambientgan} are quite promising for the problem of incomplete data, and have potentially important implications for generative approaches to compressed sensing and ill-posed problems.
We present a generic procedure for quantifying the interplay of electronic and lattice degrees of freedom in photo-doped insulators through a comparative analysis of theoretical many-body simulations and time- and angle-resolved photoemission spectroscopy (TR-ARPES) of the transient response of the candidate excitonic insulator Ta2NiSe5. Our analysis demonstrates that the electron-electron interactions dominate the electron-phonon ones. In particular, a detailed analysis of the TRARPES spectrum enables a clear separation of the dominant broadening (electronic lifetime) effects from the much smaller bandgap renormalization. Theoretical calculations show that the observed strong spectral broadening arises from the electronic scattering of the photo-excited particle-hole pairs and cannot be accounted for in a model in which electron-phonon interactions are dominant. We demonstrate that the magnitude of the weaker subdominant bandgap renormalization sensitively depends on the distance from the semiconductor/semimetal transition in the high-temperature state, which could explain apparent contradictions between various TR-ARPES experiments. The analysis presented here indicates that electron-electron interactions play a vital role (although not necessarily the sole one) in stabilizing the insulating state.
We investigate a strongly coupled finite-density anisotropic fluid in 2+12+1 dimensions dual to an asymptotically AdS black brane that is a solution of Einstein-Maxwell-Axion theory in 3+13+1 dimensions. Despite the anisotropy, the fluid thermodynamic properties align with those of a conformal fluid. Moreover, we show that the fluid is stable under the increase of the anisotropy parameter. Additionally, we analyse the DC conductivity of the anisotropic fluid, showing its compatibility with momentum dissipation due to translational symmetry breaking. In the limit of very large anisotropy we find that the DC conductivity vanishes as a consequence of dimensionality reduction. We also find that a metal-insulator transition arises driven by the anisotropy.
We present a Python implementation for RS-HDMR-GPR (Random Sampling High Dimensional Model Representation Gaussian Process Regression). The method builds representations of multivariate functions with lower-dimensional terms, either as an expansion over orders of coupling or using terms of only a given dimensionality. This facilitates, in particular, recovering functional dependence from sparse data. The code also allows for imputation of missing values of the variables and for a significant pruning of the useful number of HDMR terms. The code can also be used for estimating relative importance of different combinations of input variables, thereby adding an element of insight to a general machine learning method. The capabilities of this regression tool are demonstrated on test cases involving synthetic analytic functions, the potential energy surface of the water molecule, kinetic energy densities of materials (crystalline magnesium, aluminum, and silicon), and financial market data.
There are no more papers matching your filters at the moment.