Nagoya University logoNagoya University
The driving forces of chiral active particles and deformations of cells are often modeled by spatially inhomogeneous but temporally periodic driving forces. Such inhomogeneous oscillatory driving forces have only recently been proposed in the context of active matter, and their effects on the systems are not yet fully understood. In this work, we theoretically study the impact of spatially inhomogeneous oscillatory driving forces on continuous symmetry breaking. We first analyze the linear model for the soft modes in the ordered phase to derive the lower critical dimension of the model, and then analyze the spherical model to investigate more detailed phase behaviors. Interestingly, our analysis reveals that symmetry breaking occurs even in one and two dimensions, where the Hohenberg--Mermin--Wagner theorem prohibits continuous symmetry breaking in equilibrium. Furthermore, fluctuations of conserved quantities, such as density, are anomalously suppressed in the long-wavelength, {\it i.e.}, show hyperuniformity.
This research introduces a Smart Agent-Based Modeling (SABM) framework leveraging GPT-4 to simulate firm competition and collusion, demonstrating that GPT-4 agents can achieve sophisticated tacit collusion and higher-level collusion with communication, mirroring human economic behavior. The study shows prices stabilizing around 7 in tacit collusion without communication and close to the cartel price of 8 with explicit communication, which also accelerates collusion formation.
3
Slip-spring models are valuable tools for simulating entangled polymers, bridging the gap between bead-spring models with excluded volume and network models with presumed reptation motion. This study focuses on the DPD-SS (Dissipative Particle Dynamics - Slip-Spring) model, which introduces slip-springs into the standard DPD polymer model with soft-core interactions. By systematically adjusting the fugacity of slip-springs, the density of slip-springs within the system is varied. Simulation results demonstrate the compatibility of models with different slip-spring densities in terms of diffusion and linear relaxation modulus when the average number of slip-springs per chain is the same. The conversion between DPD-SS models concerning length and time is achieved through Rouse scaling, which utilizes the average number of DPD beads between consecutive anchoring points of slip-springs. Additionally, the modulus conversion is accomplished through the plateau modulus that takes account of fluctuations around entanglement. Notably, diffusion and relaxation modulus from the DPD-SS model align with those reported for standard Kremer-Grest and DPD models featuring strong repulsive interactions.
We study the halo mass function in the presence of the kurtosis type of primordial non-Gaussianity. The kurtosis corresponds to the trispectrum as defined in Fourier space. The primordial trispectrum is commonly characterized by two parameters, τNL\tau_{\rm NL} and gNLg_{\rm NL}. As applications of the derived non-Gaussian mass function, we consider the effect on the abundance of void structure, the effect on early star formation and on formation of the most massive object at high redshift. We show that by comparing the effects of primordial non-Gaussianity on cluster abundance with that on void abundance, we can distinguish between the skewness and the kurtosis types of primordial non-Gaussianity. As for early star formation, we show that the kurtosis type of primordial non-Gaussianity seems not to affect the reionization history of the Universe on average. However, at high redshifts (up to z20z\simeq 20) such non-Gaussianity does somewhat affect the early stages of reionization.
We propose Llama-Mimi, a speech language model that uses a unified tokenizer and a single Transformer decoder to jointly model sequences of interleaved semantic and acoustic tokens. Comprehensive evaluation shows that Llama-Mimi achieves state-of-the-art performance in acoustic consistency and possesses the ability to preserve speaker identity. Our analysis further demonstrates that increasing the number of quantizers improves acoustic fidelity but degrades linguistic performance, highlighting the inherent challenge of maintaining long-term coherence. We additionally introduce an LLM-as-a-Judge-based evaluation to assess the spoken content quality of generated outputs. Our models, code, and speech samples are publicly available.
Leptoquark models are prime candidates for new physics (NP) explanations of the long-standing anomalies in semi-leptonic BB decays; bcτνˉb\to c \tau \bar\nu (encoded in R(D())R(D^{(\ast)})) and bsˉ(=e,μ)b\to s\ell\bar\ell (\ell=e,\mu) transitions. Furthermore, Belle II and NA62 reported weaker-than-expected limits on B+K+ννˉB^+ \to K^+ \nu\bar\nu and K+π+ννˉK^+ \to \pi^+ \nu \bar\nu, respectively. While the R(D())R(D^{(\ast)}) and bsˉb\to s\ell \bar\ell measurements can be explained with NP contributions at the O(10%)O(10\%) level, the neutrino channels suggest that the NP effect could be comparable in size to the Standard Model one. In this context, we consider the two types of leptoquark models with minimal sets of the couplings that can best describe the semi-leptonic BB anomalies and lead at the same time to effects in the neutrino modes, the singlet-triplet scalar leptoquark model (S1+S3S_1+S_3) and the singlet vector leptoquark model (U1U_1). More specifically, the neutrino channels pose non-trivial constraints on the parameter space, and we find that large effects (i.e., accounting for the current central value) in BK()ννˉB\to K^{(*)}\nu\bar\nu are only possible in the S1+S3S_1+S_3 setup, while both models can account for the central value of K+π+ννˉK^+\to \pi^+\nu\bar\nu.
We define entropic marginally outer trapped surfaces (E-MOTSs) as a generalization of apparent horizons. We then show that, under first-order perturbations around a stationary black hole, the dynamical black hole entropy proposed by Hollands, Wald, and Zhang, defined on a background Killing horizon, can be expressed as the Wall entropy evaluated on an E-MOTS associated with it. Our result ensures that the Hollands-Wald-Zhang entropy reduces to the standard Wald entropy in each stationary regime of a dynamical black hole, thereby reinforcing the robustness of the dynamical entropy formulation.
This is the summary paper for the AudioMOS Challenge 2025, the very first challenge for automatic subjective quality prediction for synthetic audio. The challenge consists of three tracks. The first track aims to assess text-to-music samples in terms of overall quality and textual alignment. The second track is based on the four evaluation dimensions of Meta Audiobox Aesthetics, and the test set consists of text-to-speech, text-to-audio, and text-to-music samples. The third track focuses on synthetic speech quality assessment in different sampling rates. The challenge attracted 24 unique teams from both academia and industry, and improvements over the baselines were confirmed. The outcome of this challenge is expected to facilitate development and progress in the field of automatic evaluation for audio generation systems.
9
SoccerTrack v2 is a new public dataset for advancing multi-object tracking (MOT), game state reconstruction (GSR), and ball action spotting (BAS) in soccer analytics. Unlike prior datasets that use broadcast views or limited scenarios, SoccerTrack v2 provides 10 full-length, panoramic 4K recordings of university-level matches, captured with BePro cameras for complete player visibility. Each video is annotated with GSR labels (2D pitch coordinates, jersey-based player IDs, roles, teams) and BAS labels for 12 action classes (e.g., Pass, Drive, Shot). This technical report outlines the datasets structure, collection pipeline, and annotation process. SoccerTrack v2 is designed to advance research in computer vision and soccer analytics, enabling new benchmarks and practical applications in tactical analysis and automated tools.
The 21cm forest, narrow absorption features in the spectra of high redshift radio sources caused by intervening neutral hydrogen, offers a unique probe of the intergalactic medium and small-scale structures during reionization. While traditional power spectrum methods have been widely used for analyzing the 21cm forest, these techniques are limited in capturing the non-Gaussian nature of the signal. In this work, we introduce the Wavelet Scattering Transform (WST) as a novel diagnostic tool for the 21cm forest, which allows for the extraction of higher-order statistical features that power spectrum methods cannot easily capture. By decomposing simulated brightness temperature spectra into a hierarchy of scattering coefficients, the WST isolates both local intensity fluctuations (first-order coefficients) and scale-scale correlations (second-order coefficients), revealing the complex, multi-scale non-Gaussian interactions inherent in the 21cm forest. This approach enhances the power of 21cm forest in distinguishing between different cosmological models, such as Cold Dark Matter (CDM) and Warm Dark Matter (WDM), as well as scenarios with enhanced X-ray heating. Unlike traditional methods, which focus primarily on Gaussian statistics, the WST captures richer astrophysical and cosmological information. Our analysis shows that WST can significantly improve constraints on key parameters, such as the X-ray heating efficiency and the WDM particle mass, providing deeper insights into the early stages of cosmic structure formation.
Vision-Language Foundation Models (VLMs), trained on large-scale multimodal datasets, have driven significant advances in Artificial Intelligence (AI) by enabling rich cross-modal reasoning. Despite their success in general domains, applying these models to medical imaging remains challenging due to the limited availability of diverse imaging modalities and multilingual clinical data. Most existing medical VLMs are trained on a subset of imaging modalities and focus primarily on high-resource languages, thus limiting their generalizability and clinical utility. To address these limitations, we introduce a novel Vietnamese-language multimodal medical dataset consisting of 2,757 whole-body PET/CT volumes from independent patients and their corresponding full-length clinical reports. This dataset is designed to fill two pressing gaps in medical AI development: (1) the lack of PET/CT imaging data in existing VLMs training corpora, which hinders the development of models capable of handling functional imaging tasks; and (2) the underrepresentation of low-resource languages, particularly the Vietnamese language, in medical vision-language research. To the best of our knowledge, this is the first dataset to provide comprehensive PET/CT-report pairs in Vietnamese. We further introduce a training framework to enhance VLMs' learning, including data augmentation and expert-validated test sets. We conduct comprehensive experiments benchmarking state-of-the-art VLMs on downstream tasks. The experimental results show that incorporating our dataset significantly improves the performance of existing VLMs. We believe this dataset and benchmark will serve as a pivotal step in advancing the development of more robust VLMs for medical imaging, especially for low-resource languages and clinical use in Vietnamese healthcare. The source code is available at this https URL.
1
Nagoya University researchers develop J-Moshi, the first publicly available full-duplex spoken dialogue model for Japanese, by adapting the English Moshi architecture through a two-stage training process that includes pre-training on 60,000 hours of J-CHAT corpus data followed by fine-tuning on 344 hours of high-quality stereo dialogue data, with the extended J-Moshi-ext variant incorporating 602 hours of synthetic dialogue generated via multi-stream TTS achieving significant improvements over the Japanese-trained dGSLM baseline (100-point PPL reduction at τ=0.8) and human evaluation scores of 2.67 for naturalness and 2.30 for meaningfulness, while successfully acquiring Japanese-specific turn-taking characteristics including increased speech overlaps (4.2 seconds per minute vs 1.2 for English Moshi) that align with established linguistic patterns in Japanese conversation, demonstrating effective cross-lingual adaptation of full-duplex dialogue systems despite quality gaps remaining compared to ground-truth audio that highlight areas for future enhancement in the RQ-Transformer component.
This research introduces a space-efficient Quantum Singular Value Transformation (QSVT) to characterize the computational power of quantum computers with limited qubits, presenting the first natural complete problems for coRQUL and new complete problems for BQL. It demonstrates uniform hardness for space-bounded quantum state testing across various distance measures, showing that such testing is as easy as space-efficiently preparing the states.
One of the most remarkable discoveries of JWST is a population of compact, red sources at z > 4, commonly referred to as Little Red Dots (LRDs). Spectroscopic identifications reported that most LRDs are active galactic nuclei (AGNs), which are preferentially found around z~6 and could imply a key phase in the formation and growth of black holes (BHs) in the early universe. Photometric surveys at lower redshift have recently been carried out to trace their evolution across cosmic time, and a small number of LRDs have been spectroscopically identified at both Cosmic Noon and in the local universe. Here we report the discovery of one of the lowest-z analogs of LRDs, J204837.26-002437.2 (hereafter J2048) at z = 0.4332, using new Gemini-N/GMOS IFU observations combined with archival multi-band photometric SED data. The GMOS data reveal extended blue emission from starburst with a star formation rate of 400 Msun yr-1, together with an extended, highly fast ionized outflow. This is the first spectroscopic confirmation of extended host emission and outflow in an LRD-like galaxy, providing a unique laboratory for understanding the nature of their high-redshift counterparts. Moreover, J2048 would host an extremely overmassive BH with a BH-to-stellar mass ratio of 0.6, with the BH mass and host stellar mass estimated to be 10^10.2 and 10^10.4 Msun, respectively. We discuss the origin and evolutionary fate of J2048, and the implications that such low-z analogs have for interpreting the properties of high-z LRDs.
This paper argues that Large Language Model (LLM)-based social simulations require clear boundaries to reliably contribute to social science, proposing a framework to address their inherent limitations. The work identifies that LLMs tend to generate an "average persona," leading to insufficient behavioral heterogeneity, and outlines three key boundary problems: alignment, consistency, and robustness, offering heuristic guidelines for researchers.
To understand the emergence of macroscopic irreversibility from microscopic reversible dynamics, the idea of coarse-graining plays a fundamental role. In this work, we develop a unified inferential framework for macroscopic states, that is, coarse descriptions of microscopic quantum systems that can be inferred from macroscopic measurements. Building on quantum statistical sufficiency and Bayesian retrodiction, we characterize macroscopic states through equivalent abstract (algebraic) and explicit (constructive) formulations. Central to our approach is the notion of observational deficit, which quantifies the degree of irretrodictability of a state relative to a prior and a measurement. This leads to a general definition of macroscopic entropy as an inferentially grounded measure of asymmetry under Bayesian inversion. We formalize this structure in terms of inferential reference frames, defined by the pair consisting of a prior and a measurement, which encapsulate the observer's informational perspective. We then formulate a resource theory of microscopicity, treating macroscopic states as free states and introducing a hierarchy of microscopicity-non-generating operations. This theory unifies and extends existing resource theories of coherence, athermality, and asymmetry. Finally, we apply the framework to study quantum correlations under observational constraints, introducing the notion of observational discord and deriving necessary and sufficient conditions for their vanishing in terms of information recoverability. This work is dedicated to Professor Ryszard Horodecki on the occasion of his 80th birthday, in deep admiration and gratitude for his pioneering contributions to quantum information theory.
A rigorous proof of the generalized quantum Stein's lemma resolves a critical logical flaw, re-establishing the foundational second law of quantum resource theories. This work, by Masahito Hayashi and Hayata Yamasaki, extends the framework to classical-quantum channels, providing a universal characterization of quantum resource convertibility for both states and dynamic processes.
This paper provides a comprehensive analysis of modality bias in Multimodal Large Language Models (MLLMs), where models disproportionately rely on textual information over visual inputs. The research identifies key contributing factors such as dataset imbalances and architectural choices, offering a systematic experimental validation and a roadmap for addressing this fundamental challenge.
We present the findings of the latest iteration of the Singing Voice Conversion Challenge, a scientific event aiming to compare and understand different voice conversion systems in a controlled environment. Compared to previous iterations which solely focused on converting the singer identity, this year we also focused on converting the singing style of the singer. To create a controlled environment and thorough evaluations, we developed a new challenge database, introduced two tasks, open-sourced baselines, and conducted large-scale crowd-sourced listening tests and objective evaluations. The challenge was ran for two months and in total we evaluated 26 different systems. The results of the large-scale crowd-sourced listening test showed that top systems had comparable singer identity scores to ground truth samples. However, modeling the singing style and consequently achieving high naturalness still remains a challenge in this task, primarily due to the difficulty in modeling dynamic information in breathy, glissando, and vibrato singing styles.
Cold dark matter (CDM) can be thought of as a 2D (or 3D) sheet of particles in 4D (or 6D) phase-space due to its negligible velocity dispersion. The large-scale structure, also called the cosmic web, is thus a result of the topology of the CDM manifold. Initial crossing of particle trajectories occurs at the critical points of this manifold, forming singularities that seed most of the collapsed structures. The cosmic web can thus be characterized using the points of singularities. In this context, we employ catastrophe theory in 2D to study the motion around such singularities and analytically model the shape of the emerging structures, particularly the pancakes, which later evolve into halos and filaments-the building blocks of the 2D web. We compute higher-order corrections to the shape of the pancakes, including properties such as the curvature and the scale of transition from their C to S shape. Using Gaussian statistics (with the assumption of Zeldovich flow) for our model parameters, we also compute the distributions of observable features related to the shape of pancakes and their variation across halo and filament populations in 2D cosmologies. We find that a larger fraction of pancakes evolve into filaments, they are more curved if they are to evolve into halos, are dominantly C-shaped, and the nature of shell-crossing is highly anisotropic. Extending this work to 3D will allow testing of predictions against actual observations of the cosmic web and searching for signatures of non-Gaussianity at corresponding scales.
There are no more papers matching your filters at the moment.