University of Waikato
This research introduces "Usable-by-Construction," a formal framework that applies compositional design rules, rooted in logic and set theory, to embed usability properties into a system from its inception. The work develops formal tactics and integrates usability-enhancing components to systematically build interactive systems with provable usability attributes.
Every 20 seconds, a limb is amputated somewhere in the world due to diabetes. This is a global health problem that requires a global solution. The MICCAI challenge discussed in this paper, which concerns the automated detection of diabetic foot ulcers using machine learning techniques, will accelerate the development of innovative healthcare technology to address this unmet medical need. In an effort to improve patient care and reduce the strain on healthcare systems, recent research has focused on the creation of cloud-based detection algorithms. These can be consumed as a service by a mobile app that patients (or a carer, partner or family member) could use themselves at home to monitor their condition and to detect the appearance of a diabetic foot ulcer (DFU). Collaborative work between Manchester Metropolitan University, Lancashire Teaching Hospital and the Manchester University NHS Foundation Trust has created a repository of 4,000 DFU images for the purpose of supporting research toward more advanced methods of DFU detection. Based on a joint effort involving the lead scientists of the UK, US, India and New Zealand, this challenge will solicit original work, and promote interactions between researchers and interdisciplinary collaborations. This paper presents a dataset description and analysis, assessment methods, benchmark algorithms and initial evaluation results. It facilitates the challenge by providing useful insights into state-of-the-art and ongoing research. This grand challenge takes on even greater urgency in a peri and post-pandemic period, where stresses on resource utilization will increase the need for technology that allows people to remain active, healthy and intact in their home.
River is a unified Python library for online machine learning, formed by merging Creme and scikit-multiflow, designed to handle continuous data streams. It achieves competitive accuracy while demonstrating significant speed improvements and constant memory usage compared to prior methods, addressing challenges like concept drift and real-time processing.
5,358
We study the statistics of dynamical quantities associated with magnetic reconnection events embedded in a sea of strong background magnetohydrodynamic (MHD) turbulence using direct numerical simulations. We focus on the relationship of the reconnection properties to the statistics of global turbulent fields. For the first time, we show that the distribution in turbulence of reconnection rates (determined by upstream fields) is strongly correlated with the magnitude of the global turbulent magnetic field at the correlation scale. The average reconnection rates, and associated dissipation rates, during turbulence are thus much larger than predicted by using turbulent magnetic field fluctuation amplitudes at the dissipation or kinetic scales. Magnetic reconnection may therefore be playing a major role in energy dissipation in astrophysical and heliospheric turbulence.
The anomaly detection literature is abundant with offline methods, which require repeated access to data in memory, and impose impractical assumptions when applied to a streaming context. Existing online anomaly detection methods also generally fail to address these constraints, resorting to periodic retraining to adapt to the online context. We propose Online-iForest, a novel method explicitly designed for streaming conditions that seamlessly tracks the data generating process as it evolves over time. Experimental validation on real-world datasets demonstrated that Online-iForest is on par with online alternatives and closely rivals state-of-the-art offline anomaly detection techniques that undergo periodic retraining. Notably, Online-iForest consistently outperforms all competitors in terms of efficiency, making it a promising solution in applications where fast identification of anomalies is of primary importance such as cybersecurity, fraud and fault detection.
6
Infrastructure as Code (IaC) has become integral to modern software development, enabling automated and consistent configuration of computing environments. The rapid proliferation of IaC scripts has highlighted the need for better code quality assessment methods. This paper proposes a new IaC code quality framework specifically showcased for Ansible repositories as a foundation. By analyzing a comprehensive dataset of repositories from Ansible Galaxy, we applied our framework to evaluate code quality across multiple attributes. The analysis of our code quality metrics applied to Ansible Galaxy repositories reveal trends over time indicating improvements in areas such as metadata and error handling, while highlighting declines in others such as sophistication and automation. The framework offers practitioners a systematic tool for assessing and enhancing IaC scripts, fostering standardization and facilitating continuous improvement. It also provides a standardized foundation for further work into IaC code quality.
The family of methods collectively known as classifier chains has become a popular approach to multi-label learning problems. This approach involves linking together off-the-shelf binary classifiers in a chain structure, such that class label predictions become features for other classifiers. Such methods have proved flexible and effective and have obtained state-of-the-art empirical performance across many datasets and multi-label evaluation metrics. This performance led to further studies of how exactly it works, and how it could be improved, and in the recent decade numerous studies have explored classifier chains mechanisms on a theoretical level, and many improvements have been made to the training and inference procedures, such that this method remains among the state-of-the-art options for multi-label learning. Given this past and ongoing interest, which covers a broad range of applications and research themes, the goal of this work is to provide a review of classifier chains, a survey of the techniques and extensions provided in the literature, as well as perspectives for this approach in the domain of multi-label classification in the future. We conclude positively, with a number of recommendations for researchers and practitioners, as well as outlining a number of areas for future research.
The computational cost of dynamical downscaling limits ensemble sizes in regional downscaling efforts. We present a newly developed generative-AI approach to greatly expand the scope of such downscaling, enabling fine-scale future changes to be characterised including rare extremes that cannot be addressed by traditional approaches. We test this approach for New Zealand, where strong regional effects are anticipated. At fine scales, the forced (predictable) component of precipitation and temperature extremes for future periods (2080--2099) is spatially smoother than changes in individual simulations, and locally smaller. Future changes in rarer (10-year and 20-year) precipitation extremes are more severe and have larger internal variability spread than annual extremes. Internal variability spread is larger at fine scales that at the coarser scales simulated in climate models. Unpredictability from internal variability dominates model uncertainty and, for precipitation, its variance increases with warming, exceeding the variance across emission scenarios by fourfold for annual and tenfold for decadal extremes. These results indicate that fine-scale changes in future precipitation are less predictable than widely assumed and require much larger ensembles to assess reliably than changes at coarser scales.
We demonstrate the occurrence of permanent spikes using the Lemaitre-Tolman-Bondi models, chosen because the solutions are exact and can be analyzed by qualitative dynamical systems methods. Three examples are given and illustrated numerically. The third example demonstrates that spikes can form directly in the matter density, as opposed to indirectly in previous studies of spikes in the Kasner regime. Spikes provide an alternative general relativistic mechanism for generating exceptionally large structures observed in the Universe.
Understanding how droughts may change in the future is essential for anticipating and mitigating their adverse impacts. However, robust climate projections require large amounts of high-resolution climate simulations, particularly for assessing extreme events. Here, we use a novel dataset, multiple large-ensembles of Global Climate Models (GCMs), downscaled to 12km using generative AI, to quantify the future risk of meteorological drought across New Zealand. The ensembles consists of 20 GCMs, including two single-model initial condition large ensembles. The AI is trained to emulate a physics-based regional climate model (RCM) used in dynamically downscaling, and adds a similar amount of value as the RCM across precipitation and drought metrics. Marked increases in precipitation variability are found across all ensembles, alongside highly uncertain changes in mean precipitation. Future projections show droughts will become more intense across the majority of the country, however, internal variability and model uncertainty obscure future changes in drought durations and frequency across large portions of the country. This uncertainty is understated using a smaller number of dynamically-downscaled simulations. We find evidence that extreme droughts up to twice as long as those found in smaller ensembles, could occur across the entirety of the country in the current climate, highlighting the value of long-duration downscaled simulations to sample rare events. These extremely long droughts increase in length in many locations under a high emissions SSP3-7.0 scenario giving rise to events around 30 months long in some locations.
The spectral response of a digital camera defines the mapping between scene radiance and pixel intensity. Despite its critical importance, there is currently no comprehensive model that considers the end-to-end interaction between light input and pixel intensity output. This paper introduces a novel technique to model the spectral response of an RGB digital camera, addressing this gap. Such models are indispensable for applications requiring accurate color and spectral data interpretation. The proposed model is tested across diverse imaging scenarios by varying illumination conditions and is validated against experimental data. Results demonstrate its effectiveness in improving color fidelity and spectral accuracy, with significant implications for applications in machine vision, remote sensing, and spectral imaging. This approach offers a powerful tool for optimizing camera systems in scientific, industrial, and creative domains where spectral precision is paramount.
Machine Learning (ML) has been widely applied to cybersecurity and is considered state-of-the-art for solving many of the open issues in that field. However, it is very difficult to evaluate how good the produced solutions are, since the challenges faced in security may not appear in other areas. One of these challenges is the concept drift, which increases the existing arms race between attackers and defenders: malicious actors can always create novel threats to overcome the defense solutions, which may not consider them in some approaches. Due to this, it is essential to know how to properly build and evaluate an ML-based security solution. In this paper, we identify, detail, and discuss the main challenges in the correct application of ML techniques to cybersecurity data. We evaluate how concept drift, evolution, delayed labels, and adversarial ML impact the existing solutions. Moreover, we address how issues related to data collection affect the quality of the results presented in the security literature, showing that new strategies are needed to improve current solutions. Finally, we present how existing solutions may fail under certain circumstances, and propose mitigations to them, presenting a novel checklist to help the development of future ML solutions for cybersecurity.
52
Model trees provide an appealing way to perform interpretable machine learning for both classification and regression problems. In contrast to ``classic'' decision trees with constant values in their leaves, model trees can use linear combinations of predictor variables in their leaf nodes to form predictions, which can help achieve higher accuracy and smaller trees. Typical algorithms for learning model trees from training data work in a greedy fashion, growing the tree in a top-down manner by recursively splitting the data into smaller and smaller subsets. Crucially, the selected splits are only locally optimal, potentially rendering the tree overly complex and less accurate than a tree whose structure is globally optimal for the training data. In this paper, we empirically investigate the effect of constructing globally optimal model trees for classification and regression with linear support vector machines at the leaf nodes. To this end, we present mixed-integer linear programming formulations to learn optimal trees, compute such trees for a large collection of benchmark data sets, and compare their performance against greedily grown model trees in terms of interpretability and accuracy. We also compare to classic optimal and greedily grown decision trees, random forests, and support vector machines. Our results show that optimal model trees can achieve competitive accuracy with very small trees. We also investigate the effect on the accuracy of replacing axis-parallel splits with multivariate ones, foregoing interpretability while potentially obtaining greater accuracy.
Online continual learning (OCL) aims to train neural networks incrementally from a non-stationary data stream with a single pass through data. Rehearsal-based methods attempt to approximate the observed input distributions over time with a small memory and revisit them later to avoid forgetting. Despite its strong empirical performance, rehearsal methods still suffer from a poor approximation of the loss landscape of past data with memory samples. This paper revisits the rehearsal dynamics in online settings. We provide theoretical insights on the inherent memory overfitting risk from the viewpoint of biased and dynamic empirical risk minimization, and examine the merits and limits of repeated rehearsal. Inspired by our analysis, a simple and intuitive baseline, Repeated Augmented Rehearsal (RAR), is designed to address the underfitting-overfitting dilemma of online rehearsal. Surprisingly, across four rather different OCL benchmarks, this simple baseline outperforms vanilla rehearsal by 9%-17% and also significantly improves state-of-the-art rehearsal-based methods MIR, ASER, and SCR. We also demonstrate that RAR successfully achieves an accurate approximation of the loss landscape of past data and high-loss ridge aversion in its learning trajectory. Extensive ablation studies are conducted to study the interplay between repeated and augmented rehearsal and reinforcement learning (RL) is applied to dynamically adjust the hyperparameters of RAR to balance the stability-plasticity trade-off online. Code is available at this https URL
Structural magnetic resonance imaging (MRI) studies have shown that Alzheimer's Disease (AD) induces both localised and widespread neural degenerative changes throughout the brain. However, the absence of segmentation that highlights brain degenerative changes presents unique challenges for training CNN-based classifiers in a supervised fashion. In this work, we evaluated several unsupervised methods to train a feature extractor for downstream AD vs. CN classification. Using the 3D T1-weighted MRI data of cognitive normal (CN) subjects from the synthetic neuroimaging LDM100K dataset, lightweight 3D CNN-based models are trained for brain age prediction, brain image rotation classification, brain image reconstruction and a multi-head task combining all three tasks into one. Feature extractors trained on the LDM100K synthetic dataset achieved similar performance compared to the same model using real-world data. This supports the feasibility of utilising large-scale synthetic data for pretext task training. All the training and testing splits are performed on the subject-level to prevent data leakage issues. Alongside the simple preprocessing steps, the random cropping data augmentation technique shows consistent improvement across all experiments.
Malware is a major threat to computer systems and imposes many challenges to cyber security. Targeted threats, such as ransomware, cause millions of dollars in losses every year. The constant increase of malware infections has been motivating popular antiviruses (AVs) to develop dedicated detection strategies, which include meticulously crafted machine learning (ML) pipelines. However, malware developers unceasingly change their samples' features to bypass detection. This constant evolution of malware samples causes changes to the data distribution (i.e., concept drifts) that directly affect ML model detection rates, something not considered in the majority of the literature work. In this work, we evaluate the impact of concept drift on malware classifiers for two Android datasets: DREBIN (about 130K apps) and a subset of AndroZoo (about 285K apps). We used these datasets to train an Adaptive Random Forest (ARF) classifier, as well as a Stochastic Gradient Descent (SGD) classifier. We also ordered all datasets samples using their VirusTotal submission timestamp and then extracted features from their textual attributes using two algorithms (Word2Vec and TF-IDF). Then, we conducted experiments comparing both feature extractors, classifiers, as well as four drift detectors (DDM, EDDM, ADWIN, and KSWIN) to determine the best approach for real environments. Finally, we compare some possible approaches to mitigate concept drift and propose a novel data stream pipeline that updates both the classifier and the feature extractor. To do so, we conducted a longitudinal evaluation by (i) classifying malware samples collected over nine years (2009-2018), (ii) reviewing concept drift detection algorithms to attest its pervasiveness, (iii) comparing distinct ML approaches to mitigate the issue, and (iv) proposing an ML data stream pipeline that outperformed literature approaches.
Federated learning (FL), as an emerging artificial intelligence (AI) approach, enables decentralized model training across multiple devices without exposing their local training data. FL has been increasingly gaining popularity in both academia and industry. While research works have been proposed to improve the fault tolerance of FL, the real impact of unreliable devices (e.g., dropping out, misconfiguration, poor data quality) in real-world applications is not fully investigated. We carefully chose two representative, real-world classification problems with a limited numbers of clients to better analyze FL fault tolerance. Contrary to the intuition, simple FL algorithms can perform surprisingly well in the presence of unreliable clients.
Decision Trees are prominent prediction models for interpretable Machine Learning. They have been thoroughly researched, mostly in the batch setting with a fixed labelled dataset, leading to popular algorithms such as C4.5, ID3 and CART. Unfortunately, these methods are of heuristic nature, they rely on greedy splits offering no guarantees of global optimality and often leading to unnecessarily complex and hard-to-interpret Decision Trees. Recent breakthroughs addressed this suboptimality issue in the batch setting, but no such work has considered the online setting with data arriving in a stream. To this end, we devise a new Monte Carlo Tree Search algorithm, Thompson Sampling Decision Trees (TSDT), able to produce optimal Decision Trees in an online setting. We analyse our algorithm and prove its almost sure convergence to the optimal tree. Furthermore, we conduct extensive experiments to validate our findings empirically. The proposed TSDT outperforms existing algorithms on several benchmarks, all while presenting the practical advantage of being tailored to the online setting.
5
Active learning allows machine learning models to be trained using fewer labels while retaining similar performance to traditional supervised learning. An active learner selects the most informative data points, requests their labels, and retrains itself. While this approach is promising, it raises the question of how to determine when the model is `good enough' without the additional labels required for traditional evaluation. Previously, different stopping criteria have been proposed aiming to identify the optimal stopping point. Yet, optimality can only be expressed as a domain-dependent trade-off between accuracy and the number of labels, and no criterion is superior in all applications. As a further complication, a comparison of criteria for a particular real-world application would require practitioners to collect additional labelled data they are aiming to avoid by using active learning in the first place. This work enables practitioners to employ active learning by providing actionable recommendations for which stopping criteria are best for a given real-world scenario. We contribute the first large-scale comparison of stopping criteria for pool-based active learning, using a cost measure to quantify the accuracy/label trade-off, public implementations of all stopping criteria we evaluate, and an open-source framework for evaluating stopping criteria. Our research enables practitioners to substantially reduce labelling costs by utilizing the stopping criterion which best suits their domain.
13
SHAP (SHapley Additive exPlanation) values provide a game theoretic interpretation of the predictions of machine learning models based on Shapley values. While exact calculation of SHAP values is computationally intractable in general, a recursive polynomial-time algorithm called TreeShap is available for decision tree models. However, despite its polynomial time complexity, TreeShap can become a significant bottleneck in practical machine learning pipelines when applied to large decision tree ensembles. Unfortunately, the complicated TreeShap algorithm is difficult to map to hardware accelerators such as GPUs. In this work, we present GPUTreeShap, a reformulated TreeShap algorithm suitable for massively parallel computation on graphics processing units. Our approach first preprocesses each decision tree to isolate variable sized sub-problems from the original recursive algorithm, then solves a bin packing problem, and finally maps sub-problems to single-instruction, multiple-thread (SIMT) tasks for parallel execution with specialised hardware instructions. With a single NVIDIA Tesla V100-32 GPU, we achieve speedups of up to 19x for SHAP values, and speedups of up to 340x for SHAP interaction values, over a state-of-the-art multi-core CPU implementation executed on two 20-core Xeon E5-2698 v4 2.2 GHz CPUs. We also experiment with multi-GPU computing using eight V100 GPUs, demonstrating throughput of 1.2M rows per second -- equivalent CPU-based performance is estimated to require 6850 CPU cores.
There are no more papers matching your filters at the moment.