University of Campinas
With the widespread adoption of deep learning, reinforcement learning (RL) has experienced a dramatic increase in popularity, scaling to previously intractable problems, such as playing complex games from pixel observations, sustaining conversations with humans, and controlling robotic agents. However, there is still a wide range of domains inaccessible to RL due to the high cost and danger of interacting with the environment. Offline RL is a paradigm that learns exclusively from static datasets of previously collected interactions, making it feasible to extract policies from large and diverse training datasets. Effective offline RL algorithms have a much wider range of applications than online RL, being particularly appealing for real-world applications, such as education, healthcare, and robotics. In this work, we contribute with a unifying taxonomy to classify offline RL methods. Furthermore, we provide a comprehensive review of the latest algorithmic breakthroughs in the field using a unified notation as well as a review of existing benchmarks' properties and shortcomings. Additionally, we provide a figure that summarizes the performance of each method and class of methods on different dataset properties, equipping researchers with the tools to decide which type of algorithm is best suited for the problem at hand and identify which classes of algorithms look the most promising. Finally, we provide our perspective on open problems and propose future research directions for this rapidly growing field.
FakeScope, a multimodal expert model, provides transparent AI-generated image forensics by not only detecting synthetic images but also offering detailed explanations and quantifiable confidence scores. This approach outperforms existing binary classifiers and general-purpose LMMs in detection accuracy while significantly enhancing explainability.
The MS MARCO ranking dataset has been widely used for training deep learning models for IR tasks, achieving considerable effectiveness on diverse zero-shot scenarios. However, this type of resource is scarce in languages other than English. In this work, we present mMARCO, a multilingual version of the MS MARCO passage ranking dataset comprising 13 languages that was created using machine translation. We evaluated mMARCO by finetuning monolingual and multilingual reranking models, as well as a multilingual dense retrieval model on this dataset. We also evaluated models finetuned using the mMARCO dataset in a zero-shot scenario on Mr. TyDi dataset, demonstrating that multilingual models finetuned on our translated dataset achieve superior effectiveness to models finetuned on the original English version alone. Our experiments also show that a distilled multilingual reranker is competitive with non-distilled models while having 5.4 times fewer parameters. Lastly, we show a positive correlation between translation quality and retrieval effectiveness, providing evidence that improvements in translation methods might lead to improvements in multilingual information retrieval. The translated datasets and finetuned models are available at this https URL
University of Toronto logoUniversity of TorontoUniversity of New South WalesUniversity of Amsterdam logoUniversity of AmsterdamImperial College London logoImperial College LondonGhent UniversityUniversity College London logoUniversity College LondonUniversity of Oxford logoUniversity of OxfordNagoya University logoNagoya UniversityUniversity of Copenhagen logoUniversity of CopenhagenYale University logoYale UniversityUniversitat Pompeu FabraKU Leuven logoKU LeuvenUniversity of CampinasEmory University logoEmory UniversityHarvard Medical SchoolKing’s College London logoKing’s College LondonMohamed bin Zayed University of Artificial Intelligence logoMohamed bin Zayed University of Artificial IntelligenceAristotle University of ThessalonikiTechnical University of Munich logoTechnical University of MunichKorea Advanced Institute of Science and TechnologyUniversitat de BarcelonaGerman Cancer Research Center (DKFZ)Universidad Politécnica de MadridTechnical University of DenmarkMaastricht UniversityUniversity of LeedsINSERMUniversity of LondonThe University of Western AustraliaUmeå UniversityUniversity of California San FranciscoThe Barcelona Institute of Science and TechnologyUniversidad Nacional de ColombiaFraunhofer Heinrich-Hertz-InstituteHelmholtz Center MunichKempelen Institute of Intelligent TechnologiesUniversidad Nacional del LitoralStanford University School of MedicineUniversity of ColomboUniversity of Tunis El ManarUniversity of Colorado Anschutz Medical CampusMilitary Institute of Science and TechnologyUniversity of Texas MD Anderson Cancer CenterFoundation for Research and Technology-Hellas (FORTH)Hellenic Mediterranean UniversityNile UniversityIBM Research AfricaNepal Applied Mathematics and Informatics Institute for research (NAAMII)Jimma UniversityUniversity of GhanaMedical University of GdanskTechnology and Research (A*STAR)University of Arkansas for Medical SciencesBBMRI-ERICGalileo UniversityMilton Margai Technical UniversityMuhimbili University of Health and Allied SciencesEuropean Heart NetworkPasqual Maragall FoundationNamibia University of Science & TechnologyErasmus MC University Medical CenterAlmaty AI LabUniversity M’Hamed BougaraHospital Universitario y Politécnico La FeLa Fe Health Research InstituteHospital Clínic of BarcelonaUniversity of KassalaChildren’s National Hospital Washington DCUniversity of ZambiaEurécomIstanbul Technical University
Despite major advances in artificial intelligence (AI) for medicine and healthcare, the deployment and adoption of AI technologies remain limited in real-world clinical practice. In recent years, concerns have been raised about the technical, clinical, ethical and legal risks associated with medical AI. To increase real world adoption, it is essential that medical AI tools are trusted and accepted by patients, clinicians, health organisations and authorities. This work describes the FUTURE-AI guideline as the first international consensus framework for guiding the development and deployment of trustworthy AI tools in healthcare. The FUTURE-AI consortium was founded in 2021 and currently comprises 118 inter-disciplinary experts from 51 countries representing all continents, including AI scientists, clinicians, ethicists, and social scientists. Over a two-year period, the consortium defined guiding principles and best practices for trustworthy AI through an iterative process comprising an in-depth literature review, a modified Delphi survey, and online consensus meetings. The FUTURE-AI framework was established based on 6 guiding principles for trustworthy AI in healthcare, i.e. Fairness, Universality, Traceability, Usability, Robustness and Explainability. Through consensus, a set of 28 best practices were defined, addressing technical, clinical, legal and socio-ethical dimensions. The recommendations cover the entire lifecycle of medical AI, from design, development and validation to regulation, deployment, and monitoring. FUTURE-AI is a risk-informed, assumption-free guideline which provides a structured approach for constructing medical AI tools that will be trusted, deployed and adopted in real-world practice. Researchers are encouraged to take the recommendations into account in proof-of-concept stages to facilitate future translation towards clinical practice of medical AI.
Deep learning models have revolutionized the field of medical image analysis, due to their outstanding performances. However, they are sensitive to spurious correlations, often taking advantage of dataset bias to improve results for in-domain data, but jeopardizing their generalization capabilities. In this paper, we propose to limit the amount of information these models use to reach the final classification, by using a multiple instance learning (MIL) framework. MIL forces the model to use only a (small) subset of patches in the image, identifying discriminative regions. This mimics the clinical procedures, where medical decisions are based on localized findings. We evaluate our framework on two medical applications: skin cancer diagnosis using dermoscopy and breast cancer diagnosis using mammography. Our results show that using only a subset of the patches does not compromise diagnostic performance for in-domain data, compared to the baseline approaches. However, our approach is more robust to shifts in patient demographics, while also providing more detailed explanations about which regions contributed to the decision. Code is available at: this https URL
Prototypical self-supervised learning methods consistently suffer from partial prototype collapse, where multiple prototypes converge to nearly identical representations. This undermines their central purpose -- providing diverse and informative targets to guide encoders toward rich representations -- and has led practitioners to over-parameterize prototype sets or add ad-hoc regularizers, which mitigate symptoms rather than address the root cause. We empirically trace the collapse to the joint optimization of encoders and prototypes, which encourages a type of shortcut learning: early in training prototypes drift toward redundant representations that minimize loss without necessarily enhancing representation diversity. To break the joint optimization, we introduce a fully decoupled training strategy that learns prototypes and encoders under separate objectives. Concretely, we model prototypes as a Gaussian mixture updated with an online EM-style procedure, independent of the encoder's loss. This simple yet principled decoupling eliminates prototype collapse without explicit regularization and yields consistently diverse prototypes and stronger downstream performance.
This paper proposes a Bayesian model to compare multiple algorithms on multiple data sets, on any metric. The model is based on the Bradley-Terry model, that counts the number of times one algorithm performs better than another on different data sets. Because of its Bayesian foundations, the Bayesian Bradley Terry model (BBT) has different characteristics than frequentist approaches to comparing multiple algorithms on multiple data sets, such as Demsar (2006) tests on mean rank, and Benavoli et al. (2016) multiple pairwise Wilcoxon tests with p-adjustment procedures. In particular, a Bayesian approach allows for more nuanced statements regarding the algorithms beyond claiming that the difference is or it is not statistically significant. Bayesian approaches also allow to define when two algorithms are equivalent for practical purposes, or the region of practical equivalence (ROPE). Different than a Bayesian signed rank comparison procedure proposed by Benavoli et al. (2017), our approach can define a ROPE for any metric, since it is based on probability statements, and not on differences of that metric. This paper also proposes a local ROPE concept, that evaluates whether a positive difference between a mean measure across some cross validation to the mean of some other algorithms is should be really seen as the first algorithm being better than the second, based on effect sizes. This local ROPE proposal is independent of a Bayesian use, and can be used in frequentist approaches based on ranks. A R package and a Python program that implements the BBT is available.
Researchers from the University of Campinas adapted ESM2 protein language models to process sequences up to 2,048 amino acids, doubling the original limit, and developed quantized versions that reduced memory use by up to four times for larger models. These modified models generally showed improved performance in protein function prediction, especially for sequences longer than 1,024 amino acids.
Reinforcement Learning (RL) agents have been widely used to improve networking tasks. However, understanding the decisions made by these agents is essential for their broader adoption in networking and network management. To address this, we introduce eXplaNet - a pipeline grounded in explainable artificial intelligence - designed to help networking researchers and practitioners gain deeper insights into the decision-making processes of RL-based solutions. We demonstrate how eXplaNet can be applied to refine a routing solution powered by a Q-learning agent, specifically by improving its reward function. In addition, we discuss the opportunities and challenges of incorporating explainability into RL to better optimize network performance.
We present Consistent Assignment of Views over Random Partitions (CARP), a self-supervised clustering method for representation learning of visual features. CARP learns prototypes in an end-to-end online fashion using gradient descent without additional non-differentiable modules to solve the cluster assignment problem. CARP optimizes a new pretext task based on random partitions of prototypes that regularizes the model and enforces consistency between views' assignments. Additionally, our method improves training stability and prevents collapsed solutions in joint-embedding training. Through an extensive evaluation, we demonstrate that CARP's representations are suitable for learning downstream tasks. We evaluate CARP's representations capabilities in 17 datasets across many standard protocols, including linear evaluation, few-shot classification, k-NN, k-means, image retrieval, and copy detection. We compare CARP performance to 11 existing self-supervised methods. We extensively ablate our method and demonstrate that our proposed random partition pretext task improves the quality of the learned representations by devising multiple random classification tasks. In transfer learning tasks, CARP achieves the best performance on average against many SSL methods trained for a longer time.
7
Learning-augmented algorithms have been attracting increasing interest, but have only recently been considered in the setting of explorable uncertainty where precise values of uncertain input elements can be obtained by a query and the goal is to minimize the number of queries needed to solve a problem. We study learning-augmented algorithms for sorting and hypergraph orientation under uncertainty, assuming access to untrusted predictions for the uncertain values. Our algorithms provide improved performance guarantees for accurate predictions while maintaining worst-case guarantees that are best possible without predictions. For hypergraph orientation, for any γ2\gamma \geq 2, we give an algorithm that achieves a competitive ratio of 1+1/γ1+1/\gamma for correct predictions and γ\gamma for arbitrarily wrong predictions. For sorting, we achieve an optimal solution for accurate predictions while still being 22-competitive for arbitrarily wrong predictions. These tradeoffs are the best possible. We also consider different error metrics and show that the performance of our algorithms degrades smoothly with the prediction error in all the cases where this is possible.
The scientific image integrity area presents a challenging research bottleneck, the lack of available datasets to design and evaluate forensic techniques. Its data sensitivity creates a legal hurdle that prevents one to rely on real tampered cases to build any sort of accessible forensic benchmark. To mitigate this bottleneck, we present an extendable open-source library that reproduces the most common image forgery operations reported by the research integrity community: duplication, retouching, and cleaning. Using this library and realistic scientific images, we create a large scientific forgery image benchmark (39,423 images) with an enriched ground-truth. In addition, concerned about the high number of retracted papers due to image duplication, this work evaluates the state-of-the-art copy-move detection methods in the proposed dataset, using a new metric that asserts consistent match detection between the source and the copied region. The dataset and source-code will be freely available upon acceptance of the paper.
MIA-3DCNN is an automated framework for detecting COVID-19 and classifying its severity from 3D CT lung images. The 3D CNN architecture achieved macro F1 scores of 0.8876 for detection and 0.7277 for severity classification, substantially outperforming established competition baselines.
A comprehensive survey categorizes and reviews 231 publicly available RGB-D datasets, addressing the challenge of dataset selection in computer vision. The work provides an up-to-date resource, incorporating nearly half of its listed datasets published since 2017, and highlights trends such as the rise of synthetic data and the importance of domain adaptation for generalizable models.
213
A self-supervised learning framework, W2V-SELD, adapts wav2vec 2.0 for Sound Event Localization and Detection (SELD) using multichannel spatial audio pre-training. This approach achieves competitive performance on benchmark datasets, significantly reducing the reliance on extensive labeled data by learning robust representations directly from unlabeled spatial audio.
Pruning is a standard technique for reducing the computational cost of deep networks. Many advances in pruning leverage concepts from the Lottery Ticket Hypothesis (LTH). LTH reveals that inside a trained dense network exists sparse subnetworks (tickets) able to achieve similar accuracy (i.e., win the lottery - winning tickets). Pruning at initialization focuses on finding winning tickets without training a dense network. Studies on these concepts share the trend that subnetworks come from weight or filter pruning. In this work, we investigate LTH and pruning at initialization from the lens of layer pruning. First, we confirm the existence of winning tickets when the pruning process removes layers. Leveraged by this observation, we propose to discover these winning tickets at initialization, eliminating the requirement of heavy computational resources for training the initial (over-parameterized) dense network. Extensive experiments show that our winning tickets notably speed up the training phase and reduce up to 51% of carbon emission, an important step towards democratization and green Artificial Intelligence. Beyond computational benefits, our winning tickets exhibit robustness against adversarial and out-of-distribution examples. Finally, we show that our subnetworks easily win the lottery at initialization while tickets from filter removal (the standard structured LTH) hardly become winning tickets.
We introduce an open-ended test grounded in algorithmic probability that can avoid benchmark contamination in the quantitative evaluation of frontier models in the context of their Artificial General Intelligence (AGI) and Superintelligence (ASI) claims. Unlike other tests, this test does not rely on statistical compression methods (such as GZIP or LZW), which are more closely related to Shannon entropy than to Kolmogorov complexity and are not able to test beyond simple pattern matching. The test challenges aspects of AI, in particular LLMs, related to features of intelligence of fundamental nature such as synthesis and model creation in the context of inverse problems (generating new knowledge from observation). We argue that metrics based on model abstraction and abduction (optimal Bayesian `inference') for predictive `planning' can provide a robust framework for testing intelligence, including natural intelligence (human and animal), narrow AI, AGI, and ASI. We found that LLM model versions tend to be fragile and incremental as a result of memorisation only with progress likely driven by the size of training data. The results were compared with a hybrid neurosymbolic approach that theoretically guarantees universal intelligence based on the principles of algorithmic probability and Kolmogorov complexity. The method outperforms LLMs in a proof-of-concept on short binary sequences. We prove that compression is equivalent and directly proportional to a system's predictive power and vice versa. That is, if a system can better predict it can better compress, and if it can better compress, then it can better predict. Our findings strengthen the suspicion regarding the fundamental limitations of LLMs, exposing them as systems optimised for the perception of mastery over human language.
The ability to communicate with robots using natural language is a significant step forward in human-robot interaction. However, accurately translating verbal commands into physical actions is promising, but still presents challenges. Current approaches require large datasets to train the models and are limited to robots with a maximum of 6 degrees of freedom. To address these issues, we propose a framework called InstructRobot that maps natural language instructions into robot motion without requiring the construction of large datasets or prior knowledge of the robot's kinematics model. InstructRobot employs a reinforcement learning algorithm that enables joint learning of language representations and inverse kinematics model, simplifying the entire learning process. The proposed framework is validated using a complex robot with 26 revolute joints in object manipulation tasks, demonstrating its robustness and adaptability in realistic environments. The framework can be applied to any task or domain where datasets are scarce and difficult to create, making it an intuitive and accessible solution to the challenges of training robots using linguistic communication. Open source code for the InstructRobot framework and experiments can be accessed at this https URL
We introduce Quantum Register Algebra (QRA) as an efficient tool for quantum computing. We show the direct link between QRA and Dirac formalism. We present GAALOP (Geometric Algebra Algorithms Optimizer) implementation of our approach. Using the QRA basis vectors definitions given in Section 4 and the framework based on the de Witt basis presented in Section 5, we are able to fully describe and compute with QRA in GAALOP using the geometric product. We illustrate the intuitiveness of this computation by presenting the QRA form for the well known SWAP operation on a two qubit register.
A unified dense retrieval framework, UIA, is introduced to handle diverse information access functionalities like keyword search, query by example, and complementary item recommendation within a single model. The framework incorporates an Attentive Personalization Network (APN) to adapt results based on user history, consistently outperforming specialized baselines across tasks on large-scale e-commerce datasets.
There are no more papers matching your filters at the moment.