University of the Basque Country
We present Denario, an AI multi-agent system designed to serve as a scientific research assistant. Denario can perform many different tasks, such as generating ideas, checking the literature, developing research plans, writing and executing code, making plots, and drafting and reviewing a scientific paper. The system has a modular architecture, allowing it to handle specific tasks, such as generating an idea, or carrying out end-to-end scientific analysis using Cmbagent as a deep-research backend. In this work, we describe in detail Denario and its modules, and illustrate its capabilities by presenting multiple AI-generated papers generated by it in many different scientific disciplines such as astrophysics, biology, biophysics, biomedical informatics, chemistry, material science, mathematical physics, medicine, neuroscience and planetary science. Denario also excels at combining ideas from different disciplines, and we illustrate this by showing a paper that applies methods from quantum physics and machine learning to astrophysical data. We report the evaluations performed on these papers by domain experts, who provided both numerical scores and review-like feedback. We then highlight the strengths, weaknesses, and limitations of the current system. Finally, we discuss the ethical implications of AI-driven research and reflect on how such technology relates to the philosophy of science. We publicly release the code at this https URL. A Denario demo can also be run directly on the web at this https URL, and the full app will be deployed on the cloud.
76
This paper presents a comprehensive survey and conceptual framework for Explainable Artificial Intelligence (XAI). It introduces a novel definition of explainability centered on the audience and purpose, provides detailed taxonomies for XAI techniques including a dedicated one for Deep Learning, and integrates XAI within the broader paradigm of Responsible Artificial Intelligence, addressing its ethical implications.
False information poses a significant global challenge, and manually verifying claims is a time-consuming and resource-intensive process. In this research paper, we experiment with different approaches to investigate the effectiveness of large language models (LLMs) in classifying factual claims by their veracity and generating justifications in English and Telugu. The key contributions of this work include the creation of a bilingual English-Telugu dataset and the benchmarking of different veracity classification approaches based on LLMs.
Pretrained language models often do not perform tasks in ways that are in line with our preferences, e.g., generating offensive text or factually incorrect summaries. Recent work approaches the above issue by learning from a simple form of human evaluation: comparisons between pairs of model-generated task outputs. Comparison feedback conveys limited information about human preferences per human evaluation. Here, we propose to learn from natural language feedback, which conveys more information per human evaluation. We learn from language feedback on model outputs using a three-step learning algorithm. First, we condition the language model on the initial output and feedback to generate many refinements. Second, we choose the refinement with the highest similarity to the feedback. Third, we finetune a language model to maximize the likelihood of the chosen refinement given the input. In synthetic experiments, we first evaluate whether language models accurately incorporate feedback to produce refinements, finding that only large language models (175B parameters) do so. Using only 100 samples of human-written feedback, our learning algorithm finetunes a GPT-3 model to roughly human-level summarization ability.
4,585
California Institute of Technology logoCalifornia Institute of TechnologyUniversity of Illinois at Urbana-Champaign logoUniversity of Illinois at Urbana-ChampaignUniversity of OsloUniversity of Cambridge logoUniversity of CambridgeUniversity of OklahomaUniversity of Manchester logoUniversity of ManchesterUniversity of ZurichUniversity of Southern California logoUniversity of Southern CaliforniaGhent UniversityUniversity College London logoUniversity College LondonUniversity of Oxford logoUniversity of OxfordUniversity of BonnUniversity of Copenhagen logoUniversity of CopenhagenUniversity of EdinburghINFN logoINFNUniversity of WarsawETH Zürich logoETH ZürichYonsei UniversityUniversity of British Columbia logoUniversity of British ColumbiaUniversity of CreteUniversity of GenoaUniversity of PisaUniversité Paris-Saclay logoUniversité Paris-SaclayStockholm University logoStockholm UniversityUniversity of HelsinkiUniversity of Arizona logoUniversity of ArizonaUniversity of LiverpoolUniversity of ZagrebUniversité de GenèveAalto University logoAalto UniversityUniversity of BolognaLeiden University logoLeiden UniversityUniversity of SheffieldCEA logoCEAUniversity of PortsmouthUniversity of FerraraUniversity of SussexObservatoire de ParisUniversité Côte d’AzurUniversity of FlorenceUniversity of Groningen logoUniversity of GroningenINAFUniversity of PadovaJet Propulsion LaboratoryUniversity of LiègeInstituto de Astrofísica de CanariasUniversity of NottinghamEuropean Space AgencyÉcole Polytechnique Fédérale de LausanneSISSAUniversity of TriestePontificia Universidad Católica de ChileUniversity of ValenciaUniversity of the Basque CountryObservatoire de la Côte d’AzurLudwig-Maximilians-UniversitätUniversity of California RiversideLaboratoire d’Astrophysique de MarseilleThe Oskar Klein Centre, Department of Physics, Stockholm UniversityUniversity of LyonInstitut d’Estudis Espacials de Catalunya (IEEC)Institute for Astronomy, University of HawaiiINAF-IASF MilanoUniversity of RomeInstitut d’Astrophysique SpatialeUniversity of AarhusIASUniversity of La LagunaAgenzia Spaziale Italiana (ASI)Instituto de Astrofísica e Ciências do Espaço, Universidade de LisboaESACToulouse Biotechnology InstituteNRC Herzberg, National Research Council CanadaCanadian Astronomy Data CentreUniversit Claude Bernard Lyon 1Universit di FerraraUniversit di TrentoAix-Marseille Universit",Universit de StrasbourgMax Planck-Institute for Extraterrestrial PhysicsUniversit de LyonUniversit di TorinoINAF Osservatorio Astrofisico di ArcetriUniversity of Rome “Tor Vergata ”Universit degli Studi di MilanoINAF Osservatorio Astronomico di PadovaUniversit de MontpellierUniversit degli Studi di Napoli Federico IIINAF Osservatorio di Astrofisica e Scienza dello Spazio di BolognaUniversity of Milano Bicocca
The star-forming main sequence (SFMS) is a tight relation observed between stellar masses and star formation rates (SFR) in a population of galaxies. This relation is observed at different redshifts, in various morphological, and environmental domains, and is key to understanding the underlying relations between a galaxy budget of cold gas and its stellar content. Euclid Quick Data Release 1 (Q1) gives us the opportunity to investigate this fundamental relation in galaxy formation and evolution. We complement the Euclid release with public IRAC observations of the Euclid Deep Fields, improving the quality of recovered photometric redshifts, stellar masses, and SFRs, as is shown both with simulations and a comparison with available spectroscopic redshifts. From Q1 data alone, we recover more than 30k\sim 30\,\mathrm{k} galaxies with log10(M/M)>11\log_{10} (M_\ast/M_\odot) > 11, giving a precise constraint of the SFMS at the high-mass end. We investigated the SFMS, in a redshift interval between 0.20.2 and 3.03.0, comparing our results with the existing literature and fitting them with a parameterisation taking into account the presence of a bending of the relation at the high-mass end, depending on the bending mass, M0M_0. We find good agreement with previous results in terms of M0M_0 values, and an increasing trend for the relation scatter at higher stellar masses. We also investigate the distribution of physical (e.g. dust absorption, AVA_V, and formation age) and morphological properties (e.g., Sérsic index and radius) in the SFR--stellar mass plane, and their relation with the SFMS. These results highlight the potential of Euclid in studying the fundamental scaling relations that regulate galaxy formation and evolution in anticipation of the forthcoming Data Release 1.
In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented dialogue systems, conversational dialogue systems, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class.
1
Manual peripheral blood smear (PBS) analysis is labor intensive and subjective. While deep learning offers a promising alternative, a systematic evaluation of state of the art models such as YOLOv11 for fine grained PBS detection is still lacking. In this work, we make two key contributions. First, we curate a large scale annotated dataset for blood cell detection and classification, comprising 16,891 images across 12 peripheral blood cell (PBC) classes, along with the red blood cell class, all carefully re annotated for object detection tasks. In total, the dataset contains 298,850 annotated cells. Second, we leverage this dataset to conduct a comprehensive evaluation of five YOLOv11 variants (ranging from Nano to XLarge). These models are rigorously benchmarked under two data splitting strategies (70:20:10 and 80:10:10) and systematically assessed using multiple performance criteria, including mean Average Precision (mAP), precision, recall, F1 score, and computational efficiency. Our experiments show that the YOLOv11 Medium variant achieves the best trade off, reaching a mAP@0.5 of 0.934 under the 8:1:1 split. Larger models (Large and XLarge) provide only marginal accuracy gains at substantially higher computational cost. Moreover, the 8:1:1 split consistently outperforms the 7:2:1 split across all models. These findings highlight YOLOv11, particularly the Medium variant, as a highly effective framework for automated, fine grained PBS detection. Beyond benchmarking, our publicly released dataset (this http URL) offers a valuable resource to advance research on blood cell detection and classification in hematology.
We study the violation of Bell-Mermin-Klyshko (BMK) inequalities in initial quantum states of scalar fields in inflation. We show that the Bell inequality is maximally violated by the Bunch-Davies vacuum which is a two-mode squeezed state of a scalar field. However, we find that the violation of the BMK inequalities does not increase with the number of modes to measure. We then consider a non-Bunch-Davies vacuum expressed by a four-mode squeezed state of two scalar fields. Remarkably, we find that the violation of the BMK inequalities increases exponentially with the number of modes to measure. This indicates that some evidence that our universe has a quantum mechanical origin may survive in CMB data even if quantum entanglement decays exponentially afterward due to decoherence.
Lung infections, particularly pneumonia, pose serious health risks that can escalate rapidly, especially during pandemics. Accurate AI-based severity prediction from medical imaging is essential to support timely clinical decisions and optimize patient outcomes. In this work, we present a novel method applicable to both CT scans and chest X-rays for assessing lung infection severity. Our contributions are twofold: (i) QCross-Att-PVT, a Transformer-based architecture that integrates parallel encoders, a cross-gated attention mechanism, and a feature aggregator to capture rich multi-scale features; and (ii) Conditional Online TransMix, a custom data augmentation strategy designed to address dataset imbalance by generating mixed-label image patches during training. Evaluated on two benchmark datasets, RALO CXR and Per-COVID-19 CT, our method consistently outperforms several state-of-the-art deep learning models. The results emphasize the critical role of data augmentation and gated attention in improving both robustness and predictive accuracy. This approach offers a reliable, adaptable tool to support clinical diagnosis, disease monitoring, and personalized treatment planning. The source code of this work is available at this https URL.
University of Toronto logoUniversity of TorontoUniversity of Amsterdam logoUniversity of AmsterdamCalifornia Institute of Technology logoCalifornia Institute of TechnologyUniversity of Illinois at Urbana-Champaign logoUniversity of Illinois at Urbana-ChampaignUniversity of OsloUniversity of Cambridge logoUniversity of CambridgeUniversity of ZurichUniversity of Southern California logoUniversity of Southern CaliforniaUniversity of Chicago logoUniversity of ChicagoTel Aviv University logoTel Aviv UniversityUniversity College London logoUniversity College LondonUniversity of Oxford logoUniversity of OxfordUniversity of California, Irvine logoUniversity of California, IrvineUniversity of Copenhagen logoUniversity of CopenhagenUniversity of EdinburghUniversity of British Columbia logoUniversity of British ColumbiaUniversity of CreteKavli Institute for the Physics and Mathematics of the UniverseUniversity of Florida logoUniversity of FloridaINFN Sezione di PisaSpace Telescope Science Institute logoSpace Telescope Science InstituteInstitute for Advanced StudyUniversité Paris-Saclay logoUniversité Paris-SaclayHelsinki Institute of PhysicsStockholm University logoStockholm UniversityUniversity of HelsinkiThe University of ManchesterUniversité de GenèveAalto University logoAalto UniversityQueen Mary University of London logoQueen Mary University of LondonUniversity of PortsmouthMax Planck Institute for AstrophysicsUniversity of IcelandUniversity of NaplesUniversiteit LeidenUniversity of SussexDurham University logoDurham UniversityNiels Bohr InstituteUniversity of JyväskyläUniversity of PadovaInstituto de Astrofísica de CanariasUniversity of the WitwatersrandUniversity of NottinghamEuropean Space AgencyUniversity of Cape TownUniversity of LisbonINFN, Sezione di TorinoPontificia Universidad Católica de ChileDublin Institute for Advanced StudiesJodrell Bank Centre for AstrophysicsINFN, Laboratori Nazionali di FrascatiUniversity of the Basque CountryUniversity of Hawai’iINFN, Sezione di MilanoUniversity of KwaZulu-NatalLudwig-Maximilians-UniversitätInstituto de Astrofísica de Andalucía-CSICUniversity of the Western CapeINAF – Istituto di Astrofisica Spaziale e Fisica Cosmica MilanoLaboratoire d’Astrophysique de MarseilleKavli IPMU (WPI), UTIAS, The University of TokyoMax-Planck Institut für extraterrestrische PhysikINAF-Istituto di RadioastronomiaINAF - Osservatorio di Astrofisica e Scienza dello SpazioLebanese UniversityCambridge UniversityUniversité de MarseilleINFN - Sezione di PadovaINAF-IASF MilanoCosmic Dawn CenterINFN-Sezione di BolognaINFN Sezione di RomaINAF-Osservatorio Astronomico di BolognaINFN Sezione di Roma Tor VergataNational Astronomical Observatories of ChinaSISSA - Scuola Internazionale Superiore di Studi AvanzatiUniversité de LausanneCEA Paris-SaclayUniversity of Oslo, Institute of Theoretical AstrophysicsParis SaclayNational Institute for Physics and Nuclear EngineeringExeter UniversityUniversity of Helsinki, Department of PhysicsUniversité Paris-Saclay, CNRSUniversité de Genève, Département d’AstronomieParis Institute of AstrophysicsAPC, UMR 7164, Université Paris Cité, CNRSInstitute for Advanced Study, Einstein DriveUniversité de Paris, CNRS, Astroparticule et Cosmologie, F-75013 Paris, FranceINAF - Istituto di Radioastronomia, Istituto Nazionale di AstrofisicaINAF - Osservatorio di Astrofisica e Scienza dello Spazio, Istituto Nazionale di AstrofisicaINAF - Osservatorio di Astrofisica e Scienza dello Spazio di Bologna, Istituto Nazionale di AstrofisicaUniversity of Helsinki, Department of Physics, and Helsinki Institute of PhysicsINFN-Sezione di Roma TreINFN-Sezione di FerraraUniversit de ParisUniversit Claude Bernard Lyon 1INAF Osservatorio Astronomico di CapodimonteUniversit Lyon 1Instituto de Física Teórica, (UAM/CSIC)RWTH Aachen UniversityINAF Osservatorio Astrofisico di ArcetriUniversit degli Studi di MilanoINAF Osservatorio Astronomico di PadovaUniversit de MontpellierINAF Osservatorio di Astrofisica e Scienza dello Spazio di BolognaUniversit Di BolognaUniversit de Grenoble-AlpesINFN Sezione di TriesteINAF ` Osservatorio Astronomico di TriesteINFN Sezione di FirenzeNorwegian University of Science and TechnologyINAF Osservatorio Astronomico di BreraUniversity of Milano Bicocca
The Euclid mission of the European Space Agency will deliver weak gravitational lensing and galaxy clustering surveys that can be used to constrain the standard cosmological model and extensions thereof. We present forecasts from the combination of these surveys on the sensitivity to cosmological parameters including the summed neutrino mass MνM_\nu and the effective number of relativistic species NeffN_{\rm eff} in the standard Λ\LambdaCDM scenario and in a scenario with dynamical dark energy ($w_0 w_a$CDM). We compare the accuracy of different algorithms predicting the nonlinear matter power spectrum for such models. We then validate several pipelines for Fisher matrix and MCMC forecasts, using different theory codes, algorithms for numerical derivatives, and assumptions concerning the non-linear cut-off scale. The Euclid primary probes alone will reach a sensitivity of σ(Mν)=\sigma(M_\nu)=56meV in the Λ\LambdaCDM+MνM_\nu model, whereas the combination with CMB data from Planck is expected to achieve σ(Mν)=\sigma(M_\nu)=23meV and raise the evidence for a non-zero neutrino mass to at least the 2.6σ2.6\sigma level. This can be pushed to a 4σ4\sigma detection if future CMB data from LiteBIRD and CMB Stage-IV are included. In combination with Planck, Euclid will also deliver tight constraints on $\Delta N_{\rm eff}< 0.144(95 (95%CL) in the \LambdaCDM+CDM+M_\nu++N_{\rm eff}model,or model, or \Delta N_{\rm eff}< 0.063whenfutureCMBdataareincluded.Whenfloating when future CMB data are included. When floating (w_0, w_a),wefindthatthesensitivityto, we find that the sensitivity to N_{\rm eff}$ remains stable, while that to MνM_\nu degrades at most by a factor 2. This work illustrates the complementarity between the Euclid spectroscopic and imaging/photometric surveys and between Euclid and CMB constraints. Euclid will have a great potential for measuring the neutrino mass and excluding well-motivated scenarios with additional relativistic particles.
While machine translation has traditionally relied on large amounts of parallel corpora, a recent research line has managed to train both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) systems using monolingual corpora only. In this paper, we identify and address several deficiencies of existing unsupervised SMT approaches by exploiting subword information, developing a theoretically well founded unsupervised tuning method, and incorporating a joint refinement procedure. Moreover, we use our improved SMT system to initialize a dual NMT model, which is further fine-tuned through on-the-fly back-translation. Together, we obtain large improvements over the previous state-of-the-art in unsupervised machine translation. For instance, we get 22.5 BLEU points in English-to-German WMT 2014, 5.5 points more than the previous best unsupervised system, and 0.5 points more than the (supervised) shared task winner back in 2014.
Semantic Textual Similarity (STS) measures the meaning similarity of sentences. Applications include machine translation (MT), summarization, generation, question answering (QA), short answer grading, semantic search, dialog and conversational systems. The STS shared task is a venue for assessing the current state-of-the-art. The 2017 task focuses on multilingual and cross-lingual pairs with one sub-track exploring MT quality estimation (MTQE) data. The task obtained strong participation from 31 teams, with 17 participating in all language tracks. We summarize performance and review a selection of well performing methods. Analysis highlights common errors, providing insight into the limitations of existing models. To support ongoing work on semantic representations, the STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017).
Edge Artificial Intelligence (Edge AI) embeds intelligence directly into devices at the network edge, enabling real-time processing with improved privacy and reduced latency by processing data close to its source. This review systematically examines the evolution, current landscape, and future directions of Edge AI through a multi-dimensional taxonomy including deployment location, processing capabilities such as TinyML and federated learning, application domains, and hardware types. Following PRISMA guidelines, the analysis traces the field from early content delivery networks and fog computing to modern on-device intelligence. Core enabling technologies such as specialized hardware accelerators, optimized software, and communication protocols are explored. Challenges including resource limitations, security, model management, power consumption, and connectivity are critically assessed. Emerging opportunities in neuromorphic hardware, continual learning algorithms, edge-cloud collaboration, and trustworthiness integration are highlighted, providing a comprehensive framework for researchers and practitioners.
Scheduling problems pose significant challenges in resource, industry, and operational management. This paper addresses the Unrelated Parallel Machine Scheduling Problem (UPMS) with setup times and resources using a Multi-Agent Reinforcement Learning (MARL) approach. The study introduces the Reinforcement Learning environment and conducts empirical analyses, comparing MARL with Single-Agent algorithms. The experiments employ various deep neural network policies for single- and Multi-Agent approaches. Results demonstrate the efficacy of the Maskable extension of the Proximal Policy Optimization (PPO) algorithm in Single-Agent scenarios and the Multi-Agent PPO algorithm in Multi-Agent setups. While Single-Agent algorithms perform adequately in reduced scenarios, Multi-Agent approaches reveal challenges in cooperative learning but a scalable capacity. This research contributes insights into applying MARL techniques to scheduling optimization, emphasizing the need for algorithmic sophistication balanced with scalability for intelligent scheduling solutions.
Cardiac auscultation is one of the most cost-effective techniques used to detect and identify many heart conditions. Computer-assisted decision systems based on auscultation can support physicians in their decisions. Unfortunately, the application of such systems in clinical trials is still minimal since most of them only aim to detect the presence of extra or abnormal waves in the phonocardiogram signal, i.e., only a binary ground truth variable (normal vs abnormal) is provided. This is mainly due to the lack of large publicly available datasets, where a more detailed description of such abnormal waves (e.g., cardiac murmurs) exists. To pave the way to more effective research on healthcare recommendation systems based on auscultation, our team has prepared the currently largest pediatric heart sound dataset. A total of 5282 recordings have been collected from the four main auscultation locations of 1568 patients, in the process, 215780 heart sounds have been manually annotated. Furthermore, and for the first time, each cardiac murmur has been manually annotated by an expert annotator according to its timing, shape, pitch, grading, and quality. In addition, the auscultation locations where the murmur is present were identified as well as the auscultation location where the murmur is detected more intensively. Such detailed description for a relatively large number of heart sounds may pave the way for new machine learning algorithms with a real-world application for the detection and analysis of murmur waves for diagnostic purposes.
The classification of topological materials is revisited using advanced computational workflows that integrate hybrid density functional theory calculations with exact Hartree-Fock exchange. Unlike previous studies, our workflow optimizes atomic configurations obtained from the Materials Project Database, followed by precise electronic structure calculations. Our results based on hybrid density functional theory calculations reveal that only 15\% of materials are topologically nontrivial, which is in stark contrast to the previously reported 30\% based on semi-local exchange and correlation functionals. This discrepancy underscores the critical dependence of topological classifications on accurate atomic and electronic structures, rendering the abundance of topological materials much lower than generally assumed.
We present the updated version of the HSI-Drive dataset aimed at developing automated driving systems (ADS) using hyperspectral imaging (HSI). The v2.0 version includes new annotated images from videos recorded during winter and fall in real driving scenarios. Added to the spring and summer images included in the previous v1.1 version, the new dataset contains 752 images covering the four seasons. In this paper, we show the improvements achieved over previously published results obtained on the v1.1 dataset, showcasing the enhanced performance of models trained on the new v2.0 dataset. We also show the progress made in comprehensive scene understanding by experimenting with more capable image segmentation models. These models include new segmentation categories aimed at the identification of essential road safety objects such as the presence of vehicles and road signs, as well as highly vulnerable groups like pedestrians and cyclists. In addition, we provide evidence of the performance and robustness of the models when applied to segmenting HSI video sequences captured in various environments and conditions. Finally, for a correct assessment of the results described in this work, the constraints imposed by the processing platforms that can sensibly be deployed in vehicles for ADS must be taken into account. Thus, and although implementation details are out of the scope of this paper, we focus our research on the development of computationally efficient, lightweight ML models that can eventually operate at high throughput rates. The dataset and some examples of segmented videos are available in this https URL
Recent highlights from the HERA experiments, Hermes, H1 and ZEUS, are reviewed and ideas for future analyses to fully exploit this unique data set are proposed. This document is a summary of a workshop on future physics with HERA data held at DESY, Hamburg at the end of 2014. All areas of HERA physics are covered and contributions from both experimentalists and theorists are included. The document outlines areas where HERA physics can still make a significant contribution, principally in a deeper understanding of QCD, and its relevance to other facilities. Within the framework of the Data Preservation in High Energy Physics, the HERA data have been preserved for analyses to take place over a timescale of 10 years and more. Therefore, although an extensive list of possibilities is presented here, safe storage of the data ensures that it can also be used in the far future should new ideas and analyses be proposed.
Herbicide field trials require accurate identification of plant species and assessment of herbicide-induced damage across diverse environments. While general-purpose vision foundation models have shown promising results in complex visual domains, their performance can be limited in agriculture, where fine-grained distinctions between species and damage types are critical. In this work, we adapt a general-purpose vision foundation model to herbicide trial characterization. Trained using a self-supervised learning approach on a large, curated agricultural dataset, the model learns rich and transferable representations optimized for herbicide trials images. Our domain-specific model significantly outperforms the best general-purpose foundation model in both species identification (F1 score improvement from 0.91 to 0.94) and damage classification (from 0.26 to 0.33). Under unseen conditions (new locations and other time), it achieves even greater gains (species identification from 0.56 to 0.66; damage classification from 0.17 to 0.27). In domain-shift scenarios, such as drone imagery, it maintains strong performance (species classification from 0.49 to 0.60). Additionally, we show that domain-specific pretraining enhances segmentation accuracy, particularly in low-annotation regimes. An annotation-efficiency analysis reveals that, under unseen conditions, the domain-specific model achieves 5.4% higher F1 score than the general-purpose model, while using 80% fewer labeled samples. These results demonstrate the generalization capabilities of domain-specific foundation models and their potential to significantly reduce manual annotation efforts, offering a scalable and automated solution for herbicide trial analysis.
In scheduling problems common in the industry and various real-world scenarios, responding in real-time to disruptive events is essential. Recent methods propose the use of deep reinforcement learning (DRL) to learn policies capable of generating solutions under this constraint. The objective of this paper is to introduce a new DRL method for solving the flexible job-shop scheduling problem, particularly for large instances. The approach is based on the use of heterogeneous graph neural networks to a more informative graph representation of the problem. This novel modeling of the problem enhances the policy's ability to capture state information and improve its decision-making capacity. Additionally, we introduce two novel approaches to enhance the performance of the DRL approach: the first involves generating a diverse set of scheduling policies, while the second combines DRL with dispatching rules (DRs) constraining the action space. Experimental results on two public benchmarks show that our approach outperforms DRs and achieves superior results compared to three state-of-the-art DRL methods, particularly for large instances.
There are no more papers matching your filters at the moment.