National Taiwan Normal University
Researchers from National Taiwan Normal University and National Sun Yat-sen University developed GazeNLQ, a framework that integrates estimated gaze information as a third modality into egocentric video Natural Language Queries. The framework achieves competitive performance on the Ego4D NLQ dataset, demonstrating the utility of gaze for enhancing temporal localization in first-person videos.
The convergence of quantum-inspired neural networks and deep reinforcement learning offers a promising avenue for financial trading. We implemented a trading agent for USD/TWD by integrating Quantum Long Short-Term Memory (QLSTM) for short-term trend prediction with Quantum Asynchronous Advantage Actor-Critic (QA3C), a quantum-enhanced variant of the classical A3C. Trained on data from 2000-01-01 to 2025-04-30 (80\% training, 20\% testing), the long-only agent achieves 11.87\% return over around 5 years with 0.92\% max drawdown, outperforming several currency ETFs. We detail state design (QLSTM features and indicators), reward function for trend-following/risk control, and multi-core training. Results show hybrid models yield competitive FX trading performance. Implications include QLSTM's effectiveness for small-profit trades with tight risk and future enhancements. Key hyperparameters: QLSTM sequence length==4, QA3C workers==8. Limitations: classical quantum simulation and simplified strategy. \footnote{The views expressed in this article are those of the authors and do not represent the views of Wells Fargo. This article is for informational purposes only. Nothing contained in this article should be construed as investment advice. Wells Fargo makes no express or implied warranties and expressly disclaims all legal, tax, and accounting implications related to this article.
A comprehensive review systematically categorizes and analyzes Uncertainty Quantification (UQ) techniques in Artificial Intelligence, distinguishing between aleatoric and epistemic uncertainties and exploring their applications in high-risk domains like healthcare and autonomous driving to foster trustworthy AI.
We report the detection of HCN (J=32J=3-2) rotational emission from comet 3I/ATLAS at a heliocentric distance of 2.13 AU with the James Clerk Maxwell Telescope (JCMT). Observations were conducted from 07 August 2025 (UT) using the Uu^{\prime}\overline U^{\prime}\overline u heterodyne receiver and ACSIS spectroscopic backend. The HCN line was detected at >5σ>5\sigma on 14 Sep 2025 (UT) and a production rate of Q(HCN)=(4.0±1.7)×1025 s1Q({\rm HCN})=(4.0\pm1.7)\times10^{25}\ {\rm s}^{-1} was derived by non-LTE radiative transfer modelling. Preliminary estimates of the HCN/H2_2O and CN/HCN abundance ratios suggest values similar to Solar System comets.
Voice activity detection (VAD) is essential for speech-driven applications, but remains far from perfect in noisy and resource-limited environments. Existing methods often lack robustness to noise, and their frame-wise classification losses are only loosely coupled with the evaluation metric of VAD. To address these challenges, we propose SincQDR-VAD, a compact and robust framework that combines a Sinc-extractor front-end with a novel quadratic disparity ranking loss. The Sinc-extractor uses learnable bandpass filters to capture noise-resistant spectral features, while the ranking loss optimizes the pairwise score order between speech and non-speech frames to improve the area under the receiver operating characteristic curve (AUROC). A series of experiments conducted on representative benchmark datasets show that our framework considerably improves both AUROC and F2-Score, while using only 69% of the parameters compared to prior arts, confirming its efficiency and practical viability.
We report the discovery of a dense molecular ring-like structure in a dense (105^5 cm3^{-3}), cold (pc-scale CO depletion at a factor of 5), and young (104^4 year) star-forming region G34.74-0.12, revealed by C18^{18}O (2-1), HNC (1-0), and N2_2H+^+ (1-0) observations with the Atacama Large Millimeter/submillimeter Array (ALMA). The ring-like structure is redshifted with respect to the clump, spanning from Vsys,lsr+0.9V_{\rm sys,lsr} + 0.9 to Vsys,lsr+2.9V_{\rm sys,lsr} + 2.9 km s1^{-1}, with a total mass of 109 MM_{\odot}. It is spatially coincident with 1.3 mm and 3.0 mm dust continuum emission from cores, and several protostellar outflows. However, no free-free emission or H\textsc{ii} region is detected in association with this structure. With a slow expansion speed indicated by the position-velocity diagram, this ring structure differs from rings previously identified in more evolved star-forming regions. Possible explanations for the ring-like structure include a relic wind-blown bubble produced by a deeply embedded young stellar object, a hollow cavity formed by cloud-cloud interactions, a gas ring resulting from a temperature gradient, or a line-of-sight superposition of multiple outflows or dense clouds. This discovery offers a rare observational glimpse into the earliest dynamical processes involved in massive star formation.
With the increasing application of large language models (LLMs) in the medical domain, evaluating these models' performance using benchmark datasets has become crucial. This paper presents a comprehensive survey of various benchmark datasets employed in medical LLM tasks. These datasets span multiple modalities including text, image, and multimodal benchmarks, focusing on different aspects of medical knowledge such as electronic health records (EHRs), doctor-patient dialogues, medical question-answering, and medical image captioning. The survey categorizes the datasets by modality, discussing their significance, data structure, and impact on the development of LLMs for clinical tasks such as diagnosis, report generation, and predictive decision support. Key benchmarks include MIMIC-III, MIMIC-IV, BioASQ, PubMedQA, and CheXpert, which have facilitated advancements in tasks like medical report generation, clinical summarization, and synthetic data generation. The paper summarizes the challenges and opportunities in leveraging these benchmarks for advancing multimodal medical intelligence, emphasizing the need for datasets with a greater degree of language diversity, structured omics data, and innovative approaches to synthesis. This work also provides a foundation for future research in the application of LLMs in medicine, contributing to the evolving field of medical artificial intelligence.
·
The rise of Multimodal Large Language Models (MLLMs) has become a transformative force in the field of artificial intelligence, enabling machines to process and generate content across multiple modalities, such as text, images, audio, and video. These models represent a significant advancement over traditional unimodal systems, opening new frontiers in diverse applications ranging from autonomous agents to medical diagnostics. By integrating multiple modalities, MLLMs achieve a more holistic understanding of information, closely mimicking human perception. As the capabilities of MLLMs expand, the need for comprehensive and accurate performance evaluation has become increasingly critical. This survey aims to provide a systematic review of benchmark tests and evaluation methods for MLLMs, covering key topics such as foundational concepts, applications, evaluation methodologies, ethical concerns, security, efficiency, and domain-specific applications. Through the classification and analysis of existing literature, we summarize the main contributions and methodologies of various surveys, conduct a detailed comparative analysis, and examine their impact within the academic community. Additionally, we identify emerging trends and underexplored areas in MLLM research, proposing potential directions for future studies. This survey is intended to offer researchers and practitioners a comprehensive understanding of the current state of MLLM evaluation, thereby facilitating further progress in this rapidly evolving field.
This comprehensive review explores the intersection of Large Language Models (LLMs) and cognitive science, examining similarities and differences between LLMs and human cognitive processes. We analyze methods for evaluating LLMs cognitive abilities and discuss their potential as cognitive models. The review covers applications of LLMs in various cognitive fields, highlighting insights gained for cognitive science research. We assess cognitive biases and limitations of LLMs, along with proposed methods for improving their performance. The integration of LLMs with cognitive architectures is examined, revealing promising avenues for enhancing artificial intelligence (AI) capabilities. Key challenges and future research directions are identified, emphasizing the need for continued refinement of LLMs to better align with human cognition. This review provides a balanced perspective on the current state and future potential of LLMs in advancing our understanding of both artificial and human intelligence.
University of Washington logoUniversity of WashingtonCNRS logoCNRSUniversity of Toronto logoUniversity of TorontoUniversity of MississippiUniversity of CincinnatiCalifornia Institute of Technology logoCalifornia Institute of TechnologyUniversity of Cambridge logoUniversity of CambridgeINFN Sezione di NapoliMonash University logoMonash UniversityNational Central UniversityNational Astronomical Observatory of JapanVanderbilt UniversityUniversity of Notre Dame logoUniversity of Notre DameTel Aviv University logoTel Aviv UniversityUniversity College London logoUniversity College LondonNikhefGeorgia Institute of Technology logoGeorgia Institute of TechnologyUniversity of Science and Technology of China logoUniversity of Science and Technology of ChinaTsinghua University logoTsinghua UniversityThe Chinese University of Hong Kong logoThe Chinese University of Hong KongUniversity of MelbourneThe University of Texas at Austin logoThe University of Texas at AustinUniversity of WarsawPeking University logoPeking UniversityTexas A&M University logoTexas A&M UniversityUniversity of British Columbia logoUniversity of British ColumbiaNorthwestern University logoNorthwestern UniversityNASA Goddard Space Flight Center logoNASA Goddard Space Flight CenterLouisiana State UniversityUniversity of Florida logoUniversity of FloridaINFN Sezione di PisaRutherford Appleton LaboratoryUniversity of Minnesota logoUniversity of MinnesotaUniversity of Maryland logoUniversity of MarylandUniversity of Tokyo logoUniversity of TokyoIndian Institute of ScienceNational Taiwan Normal UniversityThe Pennsylvania State University logoThe Pennsylvania State UniversityRochester Institute of TechnologyGran Sasso Science InstituteSorbonne Université logoSorbonne UniversitéUniversity of Massachusetts AmherstAustralian National University logoAustralian National UniversityUniversity of AucklandCardiff UniversityUniversity of GlasgowLeibniz Universität HannoverUniversity of PortsmouthUniversidade Federal do ABCHigh Energy Accelerator Research Organization (KEK)Indian Institute of Technology MadrasUniversity of StrathclydeUniversità di GenovaUniversity of Alabama in HuntsvilleSyracuse UniversityUniversity of SannioRMIT UniversityInstituto Nacional de Pesquisas EspaciaisUniversità di CamerinoUniversitat de les Illes BalearsMaastricht UniversityUniversity of BirminghamUniversità di TriesteNational Cheng Kung UniversityAix Marseille UniversityKyushu UniversityUniversity of South CarolinaWashington State UniversityUniversity of OregonNational Tsing-Hua UniversityKindai UniversityThe University of Western AustraliaUniversidade de AveiroEötvös Loránd UniversityUniversitat Autònoma de BarcelonaSofia UniversityNicolaus Copernicus Astronomical CenterInstituto de Fisica Teorica UAM/CSICShanghai Astronomical ObservatoryNicolaus Copernicus UniversityINFN, Laboratori Nazionali di FrascatiUniversity of Western OntarioUniversità di Napoli Federico IIUniversity of California, Santa Cruz logoUniversity of California, Santa CruzEmbry-Riddle Aeronautical UniversityUniversity of Hawai’iUniversity of Electro-CommunicationsNational Chung Hsing UniversityMontana State UniversityInternational Centre for Theoretical SciencesINFN Sezione di PerugiaIstituto Nazionale di Alta MatematicaThe University of SheffieldUniversité de la Côte d’AzurPhysikalisch-Technische BundesanstaltInstitut de Física d’Altes Energies (IFAE)INFN - Sezione di PadovaUniversity of the Balearic IslandsLaboratoire Kastler BrosselUniversità di FirenzeUniversity of ToyamaIstituto Nazionale di OtticaINFN-Sezione di GenovaUniversiteit AntwerpenThe University of MississippiUniversity of SzegedUniversità di PerugiaINFN-Sezione di BolognaUniversità di CagliariVU AmsterdamInstitute for Cosmic Ray Research, University of TokyoINFN Sezione di Roma Tor VergataUniversité de Paris, CNRS, Astroparticule et Cosmologie,California State University, Los AngelesUniversità di SienaLIGO Livingston ObservatoryNational Center for High-Performance ComputingNCBJLaboratoire AstroParticule et Cosmologie - CNRSUniversità di Urbino Carlo BoUniversità degli Studi di SassariUniversità di Trento, INFN-TIFPAWigner RCP, RMKIINFN Sezione di CagliariRESCEU, University of TokyoUniv Lyon, ENS de Lyon, CNRS, Université Claude Bernard Lyon 1Universite de Nice, ARTEMIS, CNRS, Observatoire de la Cote d’AzurIstituto de Fısica Teórica, UAM/CSICAlbert-Einstein-Institut, HanoverAPC, AstroParticule et Cosmologie, CNRSGSSI, INFN, Laboratori Nazionali del Gran SassoNational Institute of Technology, Akashi CollegeLAPP, Universit´e Savoie Mont BlancUniversità di NapoliUniversità degli Studi di CamerinoThe University of Sheffield, Department of Physics and AstronomyUniversite de Paris* National and Kapodistrian University of AthensFriedrich-Schiller-Universität JenaUniversit Grenoble AlpesUniversit degli Studi di GenovaUniversit Libre de BruxellesUniversit di TrentoUniversit di SalernoUniversit degli Studi di PadovaUniversit de BordeauxUniversit di Roma La SapienzaUniversit Paris CitUniversit de StrasbourgUniversit de LyonUniversit di PisaINAF Osservatorio Astronomico di PadovaUniversit de MontpellierUniversit di Roma Tor VergataUniversit Di BolognaINAF ` Osservatorio Astronomico di TriesteINFN Sezione di Firenze
The ever-increasing number of detections of gravitational waves (GWs) from compact binaries by the Advanced LIGO and Advanced Virgo detectors allows us to perform ever-more sensitive tests of general relativity (GR) in the dynamical and strong-field regime of gravity. We perform a suite of tests of GR using the compact binary signals observed during the second half of the third observing run of those detectors. We restrict our analysis to the 15 confident signals that have false alarm rates 103yr1\leq 10^{-3}\, {\rm yr}^{-1}. In addition to signals consistent with binary black hole (BH) mergers, the new events include GW200115_042309, a signal consistent with a neutron star--BH merger. We find the residual power, after subtracting the best fit waveform from the data for each event, to be consistent with the detector noise. Additionally, we find all the post-Newtonian deformation coefficients to be consistent with the predictions from GR, with an improvement by a factor of ~2 in the -1PN parameter. We also find that the spin-induced quadrupole moments of the binary BH constituents are consistent with those of Kerr BHs in GR. We find no evidence for dispersion of GWs, non-GR modes of polarization, or post-merger echoes in the events that were analyzed. We update the bound on the mass of the graviton, at 90% credibility, to mg2.42×1023eV/c2m_g \leq 2.42 \times 10^{-23} \mathrm{eV}/c^2. The final mass and final spin as inferred from the pre-merger and post-merger parts of the waveform are consistent with each other. The studies of the properties of the remnant BHs, including deviations of the quasi-normal mode frequencies and damping times, show consistency with the predictions of GR. In addition to considering signals individually, we also combine results from the catalog of GW signals to calculate more precise population constraints. We find no evidence in support of physics beyond GR.
Large Language Models (LLMs) have rapidly evolved from text-based systems to multimodal platforms, significantly impacting various sectors including healthcare. This comprehensive review explores the progression of LLMs to Multimodal Large Language Models (MLLMs) and their growing influence in medical practice. We examine the current landscape of MLLMs in healthcare, analyzing their applications across clinical decision support, medical imaging, patient engagement, and research. The review highlights the unique capabilities of MLLMs in integrating diverse data types, such as text, images, and audio, to provide more comprehensive insights into patient health. We also address the challenges facing MLLM implementation, including data limitations, technical hurdles, and ethical considerations. By identifying key research gaps, this paper aims to guide future investigations in areas such as dataset development, modality alignment methods, and the establishment of ethical guidelines. As MLLMs continue to shape the future of healthcare, understanding their potential and limitations is crucial for their responsible and effective integration into medical practice.
While pre-trained automatic speech recognition (ASR) systems demonstrate impressive performance on matched domains, their performance often degrades when confronted with channel mismatch stemming from unseen recording environments and conditions. To mitigate this issue, we propose a novel channel-aware data simulation method for robust ASR training. Our method harnesses the synergistic power of channel-extractive techniques and generative adversarial networks (GANs). We first train a channel encoder capable of extracting embeddings from arbitrary audio. On top of this, channel embeddings are extracted using a minimal amount of target-domain data and used to guide a GAN-based speech synthesizer. This synthesizer generates speech that faithfully preserves the phonetic content of the input while mimicking the channel characteristics of the target domain. We evaluate our method on the challenging Hakka Across Taiwan (HAT) and Taiwanese Across Taiwan (TAT) corpora, achieving relative character error rate (CER) reductions of 20.02% and 9.64%, respectively, compared to the baselines. These results highlight the efficacy of our channel-aware data simulation method for bridging the gap between source- and target-domain acoustics.
Observations with the Atacama Large Millimeter/submillimeter Array (ALMA) and the Jansky Very Large Array (JVLA) have revealed many dust rings in protoplanetary disks, often interpreted as dust traps at gas pressure bumps. Previous studies have typically modeled these rings by assuming a single dust species in drift-diffusion equilibrium, neglecting dust size evolution resulting from coagulation and fragmentation. In this work, we perform numerical simulations that incorporate both dust-gas dynamics (drift and diffusion) and dust size evolution. Our results show that the radial distributions of different dust species (up to the fragmentation limit) are nearly identical in the dust ring, as dust growth dominates over drift and diffusion (e.g., with a typical dust-to-gas ratio of ϵ102\epsilon \sim 10^{-2}). Building on this finding, we develop a comprehensive, self-consistent analytical theory that describes the dust ring structure while explicitly accounting for size evolution effects. Our model provides a unified framework for interpreting multi-wavelength observations by linking the physical dust distribution to the observed ring properties, thus laying the foundation for future observational modeling.
This research proposes that Extreme Mass Ratio Binary (EMRB) black hole systems can serve as unique astrophysical laboratories for definitively probing the composition of relativistic jets. The work models multi-wavelength emissions from episodic jet-disk collisions, demonstrating that the ratio of gamma-ray-to-UV emission provides a distinct signature capable of differentiating between leptonic and baryonic jet compositions, with potential for detectable neutrino fluxes.
After AlphaFold won the Nobel Prize, protein prediction with deep learning once again became a hot topic. We comprehensively explore advanced deep learning methods applied to protein structure prediction and design. It begins by examining recent innovations in prediction architectures, with detailed discussions on improvements such as diffusion based frameworks and novel pairwise attention modules. The text analyses key components including structure generation, evaluation metrics, multiple sequence alignment processing, and network architecture, thereby illustrating the current state of the art in computational protein modelling. Subsequent chapters focus on practical applications, presenting case studies that range from individual protein predictions to complex biomolecular interactions. Strategies for enhancing prediction accuracy and integrating deep learning techniques with experimental validation are thoroughly explored. The later sections review the industry landscape of protein design, highlighting the transformative role of artificial intelligence in biotechnology and discussing emerging market trends and future challenges. Supplementary appendices provide essential resources such as databases and open source tools, making this volume a valuable reference for researchers and students.
Evaluating audio generation systems, including text-to-music (TTM), text-to-speech (TTS), and text-to-audio (TTA), remains challenging due to the subjective and multi-dimensional nature of human perception. Existing methods treat mean opinion score (MOS) prediction as a regression problem, but standard regression losses overlook the relativity of perceptual judgments. To address this limitation, we introduce QAMRO, a novel Quality-aware Adaptive Margin Ranking Optimization framework that seamlessly integrates regression objectives from different perspectives, aiming to highlight perceptual differences and prioritize accurate ratings. Our framework leverages pre-trained audio-text models such as CLAP and Audiobox-Aesthetics, and is trained exclusively on the official AudioMOS Challenge 2025 dataset. It demonstrates superior alignment with human evaluations across all dimensions, significantly outperforming robust baseline models.
Large Language Models (LLMs) have transformed artificial intelligence by advancing natural language understanding and generation, enabling applications across fields beyond healthcare, software engineering, and conversational systems. Despite these advancements in the past few years, LLMs have shown considerable vulnerabilities, particularly to prompt injection and jailbreaking attacks. This review analyzes the state of research on these vulnerabilities and presents available defense strategies. We roughly categorize attack approaches into prompt-based, model-based, multimodal, and multilingual, covering techniques such as adversarial prompting, backdoor injections, and cross-modality exploits. We also review various defense mechanisms, including prompt filtering, transformation, alignment techniques, multi-agent defenses, and self-regulation, evaluating their strengths and shortcomings. We also discuss key metrics and benchmarks used to assess LLM safety and robustness, noting challenges like the quantification of attack success in interactive contexts and biases in existing datasets. Identifying current research gaps, we suggest future directions for resilient alignment strategies, advanced defenses against evolving attacks, automation of jailbreak detection, and consideration of ethical and societal impacts. This review emphasizes the need for continued research and cooperation within the AI community to enhance LLM security and ensure their safe deployment.
This research from National Taiwan Normal University introduces an Adaptive Learning Path Navigation (ALPN) system for e-learning, which employs Attentive Knowledge Tracing (AKT) to model student knowledge and Entropy-enhanced Proximal Policy Optimization (EPPO) for dynamic content recommendations. The system improved students' final learning outcomes by 8.2% compared to existing methods and achieved higher learning path diversity.
Deep learning has transformed AI applications but faces critical security challenges, including adversarial attacks, data poisoning, model theft, and privacy leakage. This survey examines these vulnerabilities, detailing their mechanisms and impact on model integrity and confidentiality. Practical implementations, including adversarial examples, label flipping, and backdoor attacks, are explored alongside defenses such as adversarial training, differential privacy, and federated learning, highlighting their strengths and limitations. Advanced methods like contrastive and self-supervised learning are presented for enhancing robustness. The survey concludes with future directions, emphasizing automated defenses, zero-trust architectures, and the security challenges of large AI models. A balanced approach to performance and security is essential for developing reliable deep learning systems.
41
There are no more papers matching your filters at the moment.