UPV/EHUUniversity of the Basque Country
We present Denario, an AI multi-agent system designed to serve as a scientific research assistant. Denario can perform many different tasks, such as generating ideas, checking the literature, developing research plans, writing and executing code, making plots, and drafting and reviewing a scientific paper. The system has a modular architecture, allowing it to handle specific tasks, such as generating an idea, or carrying out end-to-end scientific analysis using Cmbagent as a deep-research backend. In this work, we describe in detail Denario and its modules, and illustrate its capabilities by presenting multiple AI-generated papers generated by it in many different scientific disciplines such as astrophysics, biology, biophysics, biomedical informatics, chemistry, material science, mathematical physics, medicine, neuroscience and planetary science. Denario also excels at combining ideas from different disciplines, and we illustrate this by showing a paper that applies methods from quantum physics and machine learning to astrophysical data. We report the evaluations performed on these papers by domain experts, who provided both numerical scores and review-like feedback. We then highlight the strengths, weaknesses, and limitations of the current system. Finally, we discuss the ethical implications of AI-driven research and reflect on how such technology relates to the philosophy of science. We publicly release the code at this https URL. A Denario demo can also be run directly on the web at this https URL, and the full app will be deployed on the cloud.
76
This paper presents a comprehensive survey and conceptual framework for Explainable Artificial Intelligence (XAI). It introduces a novel definition of explainability centered on the audience and purpose, provides detailed taxonomies for XAI techniques including a dedicated one for Deep Learning, and integrates XAI within the broader paradigm of Responsible Artificial Intelligence, addressing its ethical implications.
18 Sep 2025
In this work, using the time-dependent density functional theory, we address the electron tunneling triggered by short (single-cycle and several-cycle) optical pulses in narrow metallic gaps under conditions relevant for actual experiments. We identify photon-assisted tunneling with one-photon, two-photon, and higher-order photon absorption, and we discuss the effect of the tunneling barrier, applied bias, and strength of the optical field on transition from photon-assisted tunneling (weak optical fields) to the optical field emission at strong optical fields. The numerical single-electron calculations and an analytical strong-field theory model are used to gain deeper insights into the results of the time-dependent density functional theory calculations. Additionally, our parameter-free calculations allow us to retrieve and explain recent experimental results on optically induced transport in narrow metallic gaps.
Wildfires pose a critical environmental issue to ecosystems, economies, and public safety, particularly in Mediterranean regions such as Spain. Accurate predictive models rely on high-resolution spatio-temporal data to capture the complex interplay of environmental and anthropogenic factors. To address the lack of localised and fine-grained datasets in Spain, this work introduces IberFire, a spatio-temporal datacube at 1 km x 1 km x 1-day resolution covering mainland Spain and the Balearic Islands from December 2007 to December 2024. IberFire integrates 260 features across eight main categories: auxiliary features, fire history, geography, topography, meteorology, vegetation indices, human activity, and land cover. All features are derived from open-access sources, ensuring transparency and real-time applicability. The data processing pipeline was implemented entirely using open-source tools, and the codebase has been made publicly available. This work not only enhances spatio-temporal granularity and feature diversity compared to existing European datacubes but also provides a reproducible methodology for constructing similar datasets. IberFire supports advanced wildfire risk modelling through Machine Learning (ML) and Deep Learning (DL) techniques, enables climate pattern analysis and informs strategic planning in fire prevention and land management. The dataset is publicly available on Zenodo to promote open research and collaboration.
False information poses a significant global challenge, and manually verifying claims is a time-consuming and resource-intensive process. In this research paper, we experiment with different approaches to investigate the effectiveness of large language models (LLMs) in classifying factual claims by their veracity and generating justifications in English and Telugu. The key contributions of this work include the creation of a bilingual English-Telugu dataset and the benchmarking of different veracity classification approaches based on LLMs.
Pretrained language models often do not perform tasks in ways that are in line with our preferences, e.g., generating offensive text or factually incorrect summaries. Recent work approaches the above issue by learning from a simple form of human evaluation: comparisons between pairs of model-generated task outputs. Comparison feedback conveys limited information about human preferences per human evaluation. Here, we propose to learn from natural language feedback, which conveys more information per human evaluation. We learn from language feedback on model outputs using a three-step learning algorithm. First, we condition the language model on the initial output and feedback to generate many refinements. Second, we choose the refinement with the highest similarity to the feedback. Third, we finetune a language model to maximize the likelihood of the chosen refinement given the input. In synthetic experiments, we first evaluate whether language models accurately incorporate feedback to produce refinements, finding that only large language models (175B parameters) do so. Using only 100 samples of human-written feedback, our learning algorithm finetunes a GPT-3 model to roughly human-level summarization ability.
4,585
California Institute of Technology logoCalifornia Institute of TechnologyUniversity of Illinois at Urbana-Champaign logoUniversity of Illinois at Urbana-ChampaignUniversity of OsloUniversity of Cambridge logoUniversity of CambridgeUniversity of OklahomaUniversity of Manchester logoUniversity of ManchesterUniversity of ZurichUniversity of Southern California logoUniversity of Southern CaliforniaGhent UniversityUniversity College London logoUniversity College LondonUniversity of Oxford logoUniversity of OxfordUniversity of BonnUniversity of Copenhagen logoUniversity of CopenhagenUniversity of EdinburghINFN logoINFNUniversity of WarsawETH Zürich logoETH ZürichYonsei UniversityUniversity of British Columbia logoUniversity of British ColumbiaUniversity of CreteUniversity of GenoaUniversity of PisaUniversité Paris-Saclay logoUniversité Paris-SaclayStockholm University logoStockholm UniversityUniversity of HelsinkiUniversity of Arizona logoUniversity of ArizonaUniversity of LiverpoolUniversity of ZagrebUniversité de GenèveAalto University logoAalto UniversityUniversity of BolognaLeiden University logoLeiden UniversityUniversity of SheffieldCEA logoCEAUniversity of PortsmouthUniversity of FerraraUniversity of SussexObservatoire de ParisUniversité Côte d’AzurUniversity of FlorenceUniversity of Groningen logoUniversity of GroningenINAFUniversity of PadovaJet Propulsion LaboratoryUniversity of LiègeInstituto de Astrofísica de CanariasUniversity of NottinghamEuropean Space AgencyÉcole Polytechnique Fédérale de LausanneSISSAUniversity of TriestePontificia Universidad Católica de ChileUniversity of ValenciaUniversity of the Basque CountryObservatoire de la Côte d’AzurLudwig-Maximilians-UniversitätUniversity of California RiversideLaboratoire d’Astrophysique de MarseilleThe Oskar Klein Centre, Department of Physics, Stockholm UniversityUniversity of LyonInstitut d’Estudis Espacials de Catalunya (IEEC)Institute for Astronomy, University of HawaiiINAF-IASF MilanoUniversity of RomeInstitut d’Astrophysique SpatialeUniversity of AarhusIASUniversity of La LagunaAgenzia Spaziale Italiana (ASI)Instituto de Astrofísica e Ciências do Espaço, Universidade de LisboaESACToulouse Biotechnology InstituteNRC Herzberg, National Research Council CanadaCanadian Astronomy Data CentreUniversit Claude Bernard Lyon 1Universit di FerraraUniversit di TrentoAix-Marseille Universit",Universit de StrasbourgMax Planck-Institute for Extraterrestrial PhysicsUniversit de LyonUniversit di TorinoINAF Osservatorio Astrofisico di ArcetriUniversity of Rome “Tor Vergata ”Universit degli Studi di MilanoINAF Osservatorio Astronomico di PadovaUniversit de MontpellierUniversit degli Studi di Napoli Federico IIINAF Osservatorio di Astrofisica e Scienza dello Spazio di BolognaUniversity of Milano Bicocca
The star-forming main sequence (SFMS) is a tight relation observed between stellar masses and star formation rates (SFR) in a population of galaxies. This relation is observed at different redshifts, in various morphological, and environmental domains, and is key to understanding the underlying relations between a galaxy budget of cold gas and its stellar content. Euclid Quick Data Release 1 (Q1) gives us the opportunity to investigate this fundamental relation in galaxy formation and evolution. We complement the Euclid release with public IRAC observations of the Euclid Deep Fields, improving the quality of recovered photometric redshifts, stellar masses, and SFRs, as is shown both with simulations and a comparison with available spectroscopic redshifts. From Q1 data alone, we recover more than 30k\sim 30\,\mathrm{k} galaxies with log10(M/M)>11\log_{10} (M_\ast/M_\odot) > 11, giving a precise constraint of the SFMS at the high-mass end. We investigated the SFMS, in a redshift interval between 0.20.2 and 3.03.0, comparing our results with the existing literature and fitting them with a parameterisation taking into account the presence of a bending of the relation at the high-mass end, depending on the bending mass, M0M_0. We find good agreement with previous results in terms of M0M_0 values, and an increasing trend for the relation scatter at higher stellar masses. We also investigate the distribution of physical (e.g. dust absorption, AVA_V, and formation age) and morphological properties (e.g., Sérsic index and radius) in the SFR--stellar mass plane, and their relation with the SFMS. These results highlight the potential of Euclid in studying the fundamental scaling relations that regulate galaxy formation and evolution in anticipation of the forthcoming Data Release 1.
In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented dialogue systems, conversational dialogue systems, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class.
1
We perform a detailed comparison of the dynamics of cosmic string loops obtained in cosmological field theory simulations with their expected motion according to the Nambu-Goto action. We demonstrate that these loops follow the trajectories predicted within the NG effective theory except in regions of high curvature where energy is emitted from the loop in the form of massive radiation. This energy loss continues for all the loops studied in this simulation until they self-intersect or become small enough that they annihilate and disappear well before they complete a single oscillation. We comment on the relevance of this investigation to the interpretation of the results from cosmological field theory simulations as well as their extrapolation to a cosmological context.
Manual peripheral blood smear (PBS) analysis is labor intensive and subjective. While deep learning offers a promising alternative, a systematic evaluation of state of the art models such as YOLOv11 for fine grained PBS detection is still lacking. In this work, we make two key contributions. First, we curate a large scale annotated dataset for blood cell detection and classification, comprising 16,891 images across 12 peripheral blood cell (PBC) classes, along with the red blood cell class, all carefully re annotated for object detection tasks. In total, the dataset contains 298,850 annotated cells. Second, we leverage this dataset to conduct a comprehensive evaluation of five YOLOv11 variants (ranging from Nano to XLarge). These models are rigorously benchmarked under two data splitting strategies (70:20:10 and 80:10:10) and systematically assessed using multiple performance criteria, including mean Average Precision (mAP), precision, recall, F1 score, and computational efficiency. Our experiments show that the YOLOv11 Medium variant achieves the best trade off, reaching a mAP@0.5 of 0.934 under the 8:1:1 split. Larger models (Large and XLarge) provide only marginal accuracy gains at substantially higher computational cost. Moreover, the 8:1:1 split consistently outperforms the 7:2:1 split across all models. These findings highlight YOLOv11, particularly the Medium variant, as a highly effective framework for automated, fine grained PBS detection. Beyond benchmarking, our publicly released dataset (this http URL) offers a valuable resource to advance research on blood cell detection and classification in hematology.
Gravitational waves offer an unprecedented opportunity to look into the violent high-energy processes happening during the reheating phase of our Universe. We consider a Hubble-induced phase transition scenario as a source of a post-inflationary stochastic background of gravitational waves and analyse the main characteristics of its spectrum for the first time via numerical methods. The output of a large number of fully-fledged classical lattice simulations is condensed in a set of parametric formulas that describe key features of the gravitational wave spectrum, such as its peak amplitude and characteristic frequency, and avoid the need for further time-consuming simulations. The signal from such stochastic background is compared to the prospective sensitivity of future gravitational-wave detectors.
We show that characteristics of the electron's form factor in two-dimensional materials are observable in quasiparticle interference (QPI) spectrum. We study QPI in twisted bilayer graphene using real-space tight-binding calculations combined with the kernel polynomial method, which agrees excellently with the form factor norm obtained from the continuum Hamiltonian. The QPI signals, displaying a chiral structure, reveal all distinct interference processes between states near the Dirac points. We propose pseudospin textures of twisted bilayer graphene to explain all the interference mechanisms. Our results provide microscopic insights into electronic eigenstates of twisted bilayer graphene and suggest QPI as a potential method for probing the form factor, which governs the material's quantum geometry and many-body states.
We study the violation of Bell-Mermin-Klyshko (BMK) inequalities in initial quantum states of scalar fields in inflation. We show that the Bell inequality is maximally violated by the Bunch-Davies vacuum which is a two-mode squeezed state of a scalar field. However, we find that the violation of the BMK inequalities does not increase with the number of modes to measure. We then consider a non-Bunch-Davies vacuum expressed by a four-mode squeezed state of two scalar fields. Remarkably, we find that the violation of the BMK inequalities increases exponentially with the number of modes to measure. This indicates that some evidence that our universe has a quantum mechanical origin may survive in CMB data even if quantum entanglement decays exponentially afterward due to decoherence.
Lung infections, particularly pneumonia, pose serious health risks that can escalate rapidly, especially during pandemics. Accurate AI-based severity prediction from medical imaging is essential to support timely clinical decisions and optimize patient outcomes. In this work, we present a novel method applicable to both CT scans and chest X-rays for assessing lung infection severity. Our contributions are twofold: (i) QCross-Att-PVT, a Transformer-based architecture that integrates parallel encoders, a cross-gated attention mechanism, and a feature aggregator to capture rich multi-scale features; and (ii) Conditional Online TransMix, a custom data augmentation strategy designed to address dataset imbalance by generating mixed-label image patches during training. Evaluated on two benchmark datasets, RALO CXR and Per-COVID-19 CT, our method consistently outperforms several state-of-the-art deep learning models. The results emphasize the critical role of data augmentation and gated attention in improving both robustness and predictive accuracy. This approach offers a reliable, adaptable tool to support clinical diagnosis, disease monitoring, and personalized treatment planning. The source code of this work is available at this https URL.
University of Toronto logoUniversity of TorontoUniversity of Amsterdam logoUniversity of AmsterdamCalifornia Institute of Technology logoCalifornia Institute of TechnologyUniversity of Illinois at Urbana-Champaign logoUniversity of Illinois at Urbana-ChampaignUniversity of OsloUniversity of Cambridge logoUniversity of CambridgeUniversity of ZurichUniversity of Southern California logoUniversity of Southern CaliforniaUniversity of Chicago logoUniversity of ChicagoTel Aviv University logoTel Aviv UniversityUniversity College London logoUniversity College LondonUniversity of Oxford logoUniversity of OxfordUniversity of California, Irvine logoUniversity of California, IrvineUniversity of Copenhagen logoUniversity of CopenhagenUniversity of EdinburghUniversity of British Columbia logoUniversity of British ColumbiaUniversity of CreteKavli Institute for the Physics and Mathematics of the UniverseUniversity of Florida logoUniversity of FloridaINFN Sezione di PisaSpace Telescope Science Institute logoSpace Telescope Science InstituteInstitute for Advanced StudyUniversité Paris-Saclay logoUniversité Paris-SaclayHelsinki Institute of PhysicsStockholm University logoStockholm UniversityUniversity of HelsinkiThe University of ManchesterUniversité de GenèveAalto University logoAalto UniversityQueen Mary University of London logoQueen Mary University of LondonUniversity of PortsmouthMax Planck Institute for AstrophysicsUniversity of IcelandUniversity of NaplesUniversiteit LeidenUniversity of SussexDurham University logoDurham UniversityNiels Bohr InstituteUniversity of JyväskyläUniversity of PadovaInstituto de Astrofísica de CanariasUniversity of the WitwatersrandUniversity of NottinghamEuropean Space AgencyUniversity of Cape TownUniversity of LisbonINFN, Sezione di TorinoPontificia Universidad Católica de ChileDublin Institute for Advanced StudiesJodrell Bank Centre for AstrophysicsINFN, Laboratori Nazionali di FrascatiUniversity of the Basque CountryUniversity of Hawai’iINFN, Sezione di MilanoUniversity of KwaZulu-NatalLudwig-Maximilians-UniversitätInstituto de Astrofísica de Andalucía-CSICUniversity of the Western CapeINAF – Istituto di Astrofisica Spaziale e Fisica Cosmica MilanoLaboratoire d’Astrophysique de MarseilleKavli IPMU (WPI), UTIAS, The University of TokyoMax-Planck Institut für extraterrestrische PhysikINAF-Istituto di RadioastronomiaINAF - Osservatorio di Astrofisica e Scienza dello SpazioLebanese UniversityCambridge UniversityUniversité de MarseilleINFN - Sezione di PadovaINAF-IASF MilanoCosmic Dawn CenterINFN-Sezione di BolognaINFN Sezione di RomaINAF-Osservatorio Astronomico di BolognaINFN Sezione di Roma Tor VergataNational Astronomical Observatories of ChinaSISSA - Scuola Internazionale Superiore di Studi AvanzatiUniversité de LausanneCEA Paris-SaclayUniversity of Oslo, Institute of Theoretical AstrophysicsParis SaclayNational Institute for Physics and Nuclear EngineeringExeter UniversityUniversity of Helsinki, Department of PhysicsUniversité Paris-Saclay, CNRSUniversité de Genève, Département d’AstronomieParis Institute of AstrophysicsAPC, UMR 7164, Université Paris Cité, CNRSInstitute for Advanced Study, Einstein DriveUniversité de Paris, CNRS, Astroparticule et Cosmologie, F-75013 Paris, FranceINAF - Istituto di Radioastronomia, Istituto Nazionale di AstrofisicaINAF - Osservatorio di Astrofisica e Scienza dello Spazio, Istituto Nazionale di AstrofisicaINAF - Osservatorio di Astrofisica e Scienza dello Spazio di Bologna, Istituto Nazionale di AstrofisicaUniversity of Helsinki, Department of Physics, and Helsinki Institute of PhysicsINFN-Sezione di Roma TreINFN-Sezione di FerraraUniversit de ParisUniversit Claude Bernard Lyon 1INAF Osservatorio Astronomico di CapodimonteUniversit Lyon 1Instituto de Física Teórica, (UAM/CSIC)RWTH Aachen UniversityINAF Osservatorio Astrofisico di ArcetriUniversit degli Studi di MilanoINAF Osservatorio Astronomico di PadovaUniversit de MontpellierINAF Osservatorio di Astrofisica e Scienza dello Spazio di BolognaUniversit Di BolognaUniversit de Grenoble-AlpesINFN Sezione di TriesteINAF ` Osservatorio Astronomico di TriesteINFN Sezione di FirenzeNorwegian University of Science and TechnologyINAF Osservatorio Astronomico di BreraUniversity of Milano Bicocca
The Euclid mission of the European Space Agency will deliver weak gravitational lensing and galaxy clustering surveys that can be used to constrain the standard cosmological model and extensions thereof. We present forecasts from the combination of these surveys on the sensitivity to cosmological parameters including the summed neutrino mass MνM_\nu and the effective number of relativistic species NeffN_{\rm eff} in the standard Λ\LambdaCDM scenario and in a scenario with dynamical dark energy ($w_0 w_a$CDM). We compare the accuracy of different algorithms predicting the nonlinear matter power spectrum for such models. We then validate several pipelines for Fisher matrix and MCMC forecasts, using different theory codes, algorithms for numerical derivatives, and assumptions concerning the non-linear cut-off scale. The Euclid primary probes alone will reach a sensitivity of σ(Mν)=\sigma(M_\nu)=56meV in the Λ\LambdaCDM+MνM_\nu model, whereas the combination with CMB data from Planck is expected to achieve σ(Mν)=\sigma(M_\nu)=23meV and raise the evidence for a non-zero neutrino mass to at least the 2.6σ2.6\sigma level. This can be pushed to a 4σ4\sigma detection if future CMB data from LiteBIRD and CMB Stage-IV are included. In combination with Planck, Euclid will also deliver tight constraints on $\Delta N_{\rm eff}< 0.144(95 (95%CL) in the \LambdaCDM+CDM+M_\nu++N_{\rm eff}model,or model, or \Delta N_{\rm eff}< 0.063whenfutureCMBdataareincluded.Whenfloating when future CMB data are included. When floating (w_0, w_a),wefindthatthesensitivityto, we find that the sensitivity to N_{\rm eff}$ remains stable, while that to MνM_\nu degrades at most by a factor 2. This work illustrates the complementarity between the Euclid spectroscopic and imaging/photometric surveys and between Euclid and CMB constraints. Euclid will have a great potential for measuring the neutrino mass and excluding well-motivated scenarios with additional relativistic particles.
While machine translation has traditionally relied on large amounts of parallel corpora, a recent research line has managed to train both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) systems using monolingual corpora only. In this paper, we identify and address several deficiencies of existing unsupervised SMT approaches by exploiting subword information, developing a theoretically well founded unsupervised tuning method, and incorporating a joint refinement procedure. Moreover, we use our improved SMT system to initialize a dual NMT model, which is further fine-tuned through on-the-fly back-translation. Together, we obtain large improvements over the previous state-of-the-art in unsupervised machine translation. For instance, we get 22.5 BLEU points in English-to-German WMT 2014, 5.5 points more than the previous best unsupervised system, and 0.5 points more than the (supervised) shared task winner back in 2014.
Semantic Textual Similarity (STS) measures the meaning similarity of sentences. Applications include machine translation (MT), summarization, generation, question answering (QA), short answer grading, semantic search, dialog and conversational systems. The STS shared task is a venue for assessing the current state-of-the-art. The 2017 task focuses on multilingual and cross-lingual pairs with one sub-track exploring MT quality estimation (MTQE) data. The task obtained strong participation from 31 teams, with 17 participating in all language tracks. We summarize performance and review a selection of well performing methods. Analysis highlights common errors, providing insight into the limitations of existing models. To support ongoing work on semantic representations, the STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017).
In the context of transient constant-roll inflation near a local maximum, we derive the non-perturbative field redefinition that relates a Gaussian random field with the true non-Gaussian curvature perturbation. Our analysis shows the emergence of a new critical amplitude ζ\zeta_*, corresponding to perturbations that prevent the inflaton from overshooting the local maximum, thus becoming trapped in the false minimum of the potential. For potentials with a mild curvature at the local maximum (and thus small non-Gaussianity), we recover the known perturbative field redefinition. We apply these results to the formation of primordial black holes, and discuss the cases for which ζ\zeta_* is smaller or of the same order than the critical value for collapse of spherically symmetric overdensities. In the latter case, we present a simple potential for which the power spectrum needs an amplitude 10 times smaller that in the Gaussian case for producing a sizeable amount of primordial black holes.
Edge Artificial Intelligence (Edge AI) embeds intelligence directly into devices at the network edge, enabling real-time processing with improved privacy and reduced latency by processing data close to its source. This review systematically examines the evolution, current landscape, and future directions of Edge AI through a multi-dimensional taxonomy including deployment location, processing capabilities such as TinyML and federated learning, application domains, and hardware types. Following PRISMA guidelines, the analysis traces the field from early content delivery networks and fog computing to modern on-device intelligence. Core enabling technologies such as specialized hardware accelerators, optimized software, and communication protocols are explored. Challenges including resource limitations, security, model management, power consumption, and connectivity are critically assessed. Emerging opportunities in neuromorphic hardware, continual learning algorithms, edge-cloud collaboration, and trustworthiness integration are highlighted, providing a comprehensive framework for researchers and practitioners.
Recent highlights from the HERA experiments, Hermes, H1 and ZEUS, are reviewed and ideas for future analyses to fully exploit this unique data set are proposed. This document is a summary of a workshop on future physics with HERA data held at DESY, Hamburg at the end of 2014. All areas of HERA physics are covered and contributions from both experimentalists and theorists are included. The document outlines areas where HERA physics can still make a significant contribution, principally in a deeper understanding of QCD, and its relevance to other facilities. Within the framework of the Data Preservation in High Energy Physics, the HERA data have been preserved for analyses to take place over a timescale of 10 years and more. Therefore, although an extensive list of possibilities is presented here, safe storage of the data ensures that it can also be used in the far future should new ideas and analyses be proposed.
There are no more papers matching your filters at the moment.