University of Wolverhampton
Complex wounds usually face partial or total loss of skin thickness, healing by secondary intention. They can be acute or chronic, figuring infections, ischemia and tissue necrosis, and association with systemic diseases. Research institutes around the globe report countless cases, ending up in a severe public health problem, for they involve human resources (e.g., physicians and health care professionals) and negatively impact life quality. This paper presents a new database for automatically categorizing complex wounds with five categories, i.e., non-wound area, granulation, fibrinoid tissue, and dry necrosis, hematoma. The images comprise different scenarios with complex wounds caused by pressure, vascular ulcers, diabetes, burn, and complications after surgical interventions. The dataset, called ComplexWoundDB, is unique because it figures pixel-level classifications from 2727 images obtained in the wild, i.e., images are collected at the patients' homes, labeled by four health professionals. Further experiments with distinct machine learning techniques evidence the challenges in addressing the problem of computer-aided complex wound tissue categorization. The manuscript sheds light on future directions in the area, with a detailed comparison among other databased widely used in the literature.
The growing demand for computational power is driven by advancements in deep learning, the increasing need for big data processing, and the requirements of scientific simulations for academic and research purposes. Developing countries like Nepal often struggle with the resources needed to invest in new and better hardware for these purposes. However, optimizing and building on existing technology can still meet these computing demands effectively. To address these needs, we built a cluster using four NVIDIA GeForce GTX 1650 GPUs. The cluster consists of four nodes: one master node that controls and manages the entire cluster, and three compute nodes dedicated to processing tasks. The master node is equipped with all necessary software for package management, resource scheduling, and deployment, such as Anaconda and Slurm. In addition, a Network File Storage (NFS) system was integrated to provide the additional storage required by the cluster. Given that the cluster is accessible via ssh by a public domain address, which poses significant cybersecurity risks, we implemented fail2ban to mitigate brute force attacks and enhance security. Despite the continuous challenges encountered during the design and implementation process, this project demonstrates how powerful computational clusters can be built to handle resource-intensive tasks in various demanding fields.
We adapt the Quantum Monte Carlo method to the cascaded formalism of quantum optics, allowing us to simulate the emission of photons of known energy. Statistical processing of the photon clicks thus collected agrees with the theory of frequency-resolved photon correlations, extending the range of applications based on correlations of photons of prescribed energy, in particular those of a photon-counting character. We apply the technique to autocorrelations of photon streams from a two-level system under coherent and incoherent pumping, including the Mollow triplet regime where we demonstrate the direct manifestation of leapfrog processes in producing an increased rate of two-photon emission events.
Social media companies as well as authorities make extensive use of artificial intelligence (AI) tools to monitor postings of hate speech, celebrations of violence or profanity. Since AI software requires massive volumes of data to train computers, Machine Translation (MT) of the online content is commonly used to process posts written in several languages and hence augment the data needed for training. However, MT mistakes are a regular occurrence when translating sentiment-oriented user-generated content (UGC), especially when a low-resource language is involved. The adequacy of the whole process relies on the assumption that the evaluation metrics used give a reliable indication of the quality of the translation. In this paper, we assess the ability of automatic quality metrics to detect critical machine translation errors which can cause serious misunderstanding of the affect message. We compare the performance of three canonical metrics on meaningless translations where the semantic content is seriously impaired as compared to meaningful translations with a critical error which exclusively distorts the sentiment of the source text. We conclude that there is a need for fine-tuning of automatic metrics to make them more robust in detecting sentiment critical errors.
As offensive language has become a rising issue for online communities and social media platforms, researchers have been investigating ways of coping with abusive content and developing systems to detect its different types: cyberbullying, hate speech, aggression, etc. With a few notable exceptions, most research on this topic so far has dealt with English. This is mostly due to the availability of language resources for English. To address this shortcoming, this paper presents the first Greek annotated dataset for offensive language identification: the Offensive Greek Tweet Dataset (OGTD). OGTD is a manually annotated dataset containing 4,779 posts from Twitter annotated as offensive and not offensive. Along with a detailed description of the dataset, we evaluate several computational models trained and tested on this data.
This paper presents an ensemble system combining the output of multiple SVM classifiers to native language identification (NLI). The system was submitted to the NLI Shared Task 2017 fusion track which featured students essays and spoken responses in form of audio transcriptions and iVectors by non-native English speakers of eleven native languages. Our system competed in the challenge under the team name ZCD and was based on an ensemble of SVM classifiers trained on character n-grams achieving 83.58% accuracy and ranking 3rd in the shared task.
The widespread of offensive content online, such as hate speech and cyber-bullying, is a global phenomenon. This has sparked interest in the artificial intelligence (AI) and natural language processing (NLP) communities, motivating the development of various systems trained to detect potentially harmful content automatically. These systems require annotated datasets to train the machine learning (ML) models. However, with a few notable exceptions, most datasets on this topic have dealt with English and a few other high-resource languages. As a result, the research in offensive language identification has been limited to these languages. This paper addresses this gap by tackling offensive language identification in Sinhala, a low-resource Indo-Aryan language spoken by over 17 million people in Sri Lanka. We introduce the Sinhala Offensive Language Dataset (SOLD) and present multiple experiments on this dataset. SOLD is a manually annotated dataset containing 10,000 posts from Twitter annotated as offensive and not offensive at both sentence-level and token-level, improving the explainability of the ML models. SOLD is the first large publicly available offensive language dataset compiled for Sinhala. We also introduce SemiSOLD, a larger dataset containing more than 145,000 Sinhala tweets, annotated following a semi-supervised approach.
Passive acoustic monitoring is a sustainable method of monitoring wildlife and environments that leads to the generation of large datasets and, currently, a processing backlog. Academic research into automating this process is focused on the application of resource intensive convolutional neural networks which require large pre-labelled datasets for training and lack flexibility in application. We present a viable alternative relevant in both wild and captive settings; a transparent, lightweight and fast-to-train associative memory AI model with Hopfield neural network (HNN) architecture. Adapted from a model developed to detect bat echolocation calls, this model monitors captive endangered black-and-white ruffed lemur Varecia variegata vocalisations. Lemur social calls of interest when monitoring welfare are stored in the HNN in order to detect other call instances across the larger acoustic dataset. We make significant model improvements by storing an additional signal caused by movement and achieve an overall accuracy of 0.94. The model can perform 340340 classifications per second, processing over 5.5 hours of audio data per minute, on a standard laptop running other applications. It has broad applicability and trains in milliseconds. Our lightweight solution reduces data-to-insight turnaround times and can accelerate decision making in both captive and wild settings.
Light-field cameras play a vital role for rich 3-D information retrieval in narrow range depth sensing applications. The key obstacle in composing light-fields from exposures taken by a plenoptic camera is to computationally calibrate, align and rearrange four-dimensional image data. Several attempts have been proposed to enhance the overall image quality by tailoring pipelines dedicated to particular plenoptic cameras and improving the consistency across viewpoints at the expense of high computational loads. The framework presented herein advances prior outcomes thanks to its novel micro image scale-space analysis for generic camera calibration independent of the lens specifications and its parallax-invariant, cost-effective viewpoint color equalization from optimal transport theory. Artifacts from the sensor and micro lens grid are compensated in an innovative way to enable superior quality in sub-aperture image extraction, computational refocusing and Scheimpflug rendering with sub-sampling capabilities. Benchmark comparisons using established image metrics suggest that our proposed pipeline outperforms state-of-the-art tool chains in the majority of cases. Results from a Wasserstein distance further show that our color transfer outdoes the existing transport methods. Our algorithms are released under an open-source license, offer cross-platform compatibility with few dependencies and different user interfaces. This makes the reproduction of results and experimentation with plenoptic camera technology convenient for peer researchers, developers, photographers, data scientists and others working in this field.
National research evaluation initiatives and incentive schemes have previously chosen between simplistic quantitative indicators and time-consuming peer review, sometimes supported by bibliometrics. Here we assess whether artificial intelligence (AI) could provide a third alternative, estimating article quality using more multiple bibliometric and metadata inputs. We investigated this using provisional three-level REF2021 peer review scores for 84,966 articles submitted to the UK Research Excellence Framework 2021, matching a Scopus record 2014-18 and with a substantial abstract. We found that accuracy is highest in the medical and physical sciences Units of Assessment (UoAs) and economics, reaching 42% above the baseline (72% overall) in the best case. This is based on 1000 bibliometric inputs and half of the articles used for training in each UoA. Prediction accuracies above the baseline for the social science, mathematics, engineering, arts, and humanities UoAs were much lower or close to zero. The Random Forest Classifier (standard or ordinal) and Extreme Gradient Boosting Classifier algorithms performed best from the 32 tested. Accuracy was lower if UoAs were merged or replaced by Scopus broad categories. We increased accuracy with an active learning strategy and by selecting articles with higher prediction probabilities, as estimated by the algorithms, but this substantially reduced the number of scores predicted.
This paper presents the results and main findings of SemEval-2021 Task 1 - Lexical Complexity Prediction. We provided participants with an augmented version of the CompLex Corpus (Shardlow et al 2020). CompLex is an English multi-domain corpus in which words and multi-word expressions (MWEs) were annotated with respect to their complexity using a five point Likert scale. SemEval-2021 Task 1 featured two Sub-tasks: Sub-task 1 focused on single words and Sub-task 2 focused on MWEs. The competition attracted 198 teams in total, of which 54 teams submitted official runs on the test data to Sub-task 1 and 37 to Sub-task 2.
Although bibliometrics are normally applied to journal articles when used to support research evaluations, conference papers are at least as important in fast-moving computing-related fields. It is therefore important to assess the relative advantages of citations and altmetrics for computing conference papers to make an informed decision about which, if any, to use. This paper compares Scopus citations with Mendeley reader counts for conference papers and journal articles that were published between 1996 and 2018 in 11 computing fields and had at least one US author. The data showed high correlations between Scopus citation counts and Mendeley reader counts in all fields and most years, but with few Mendeley readers for older conference papers and few Scopus citations for new conference papers and journal articles. The results therefore suggest that Mendeley reader counts have a substantial advantage over citation counts for recently-published conference papers due to their greater speed, but are unsuitable for older conference papers.
In this paper, we investigate the application of text classification methods to support law professionals. We present several experiments applying machine learning techniques to predict with high accuracy the ruling of the French Supreme Court and the law area to which a case belongs to. We also investigate the influence of the time period in which a ruling was made on the form of the case description and the extent to which we need to mask information in a full case ruling to automatically obtain training and test data that resembles case descriptions. We developed a mean probability ensemble system combining the output of multiple SVM classifiers. We report results of 98% average F1 score in predicting a case ruling, 96% F1 score for predicting the law area of a case, and 87.07% F1 score on estimating the date of a ruling.
The search for and management of external funding now occupies much valuable researcher time. Whilst funding is essential for some types of research and beneficial for others, it may also constrain academic choice and creativity. Thus, it is important to assess whether it is ever detrimental or unnecessary. Here we investigate whether funded research tends to be higher quality in all fields and for all major research funders. Based on peer review quality scores for 113,877 articles from all fields in the UK's Research Excellence Framework (REF) 2021, we estimate that there are substantial disciplinary differences in the proportion of funded journal articles, from Theology and Religious Studies (16%+) to Biological Sciences (91%+). The results suggest that funded research is likely to be higher quality overall, for all the largest research funders, and for all fields, even after factoring out research team size. There are differences between funders in the average quality of the research they support, however. Funding seems particularly beneficial in health-related fields. The results do not show cause and effect and do not take into account the amount of funding received but are consistent with funding either improving research quality or being won by high quality researchers or projects. In summary, there are no broad fields of research in which funding is irrelevant, so no fields can afford to ignore it. The results also show that citations are not effective proxies for research quality in the arts and humanities and most social sciences for evaluating research funding.
A growing issue within conservation bioacoustics is the task of analysing the vast amount of data generated from the use of passive acoustic monitoring devices. In this paper, we present an alternative AI model which has the potential to help alleviate this problem. Our model formulation addresses the key issues encountered when using current AI models for bioacoustic analysis, namely the: limited training data available; environmental impact, particularly in energy consumption and carbon footprint of training and implementing these models; and associated hardware requirements. The model developed in this work uses associative memory via a transparent, explainable Hopfield neural network to store signals and detect similar signals which can then be used to classify species. Training is rapid (33\,ms), as only one representative signal is required for each target sound within a dataset. The model is fast, taking only 5.45.4\,s to pre-process and classify all 1038410384 publicly available bat recordings, on a standard Apple MacBook Air. The model is also lightweight with a small memory footprint of 144.09144.09\,MB of RAM usage. Hence, the low computational demands make the model ideal for use on a variety of standard personal devices with potential for deployment in the field via edge-processing devices. It is also competitively accurate, with up to 86%86\% precision on the dataset used to evaluate the model. In fact, we could not find a single case of disagreement between model and manual identification via expert field guides. Although a dataset of bat echolocation calls was chosen to demo this first-of-its-kind AI model, trained on only two representative calls, the model is not species specific. In conclusion, we propose an equitable AI model that has the potential to be a game changer for fast, lightweight, sustainable, transparent, explainable and accurate bioacoustic analysis.
Researchers, editors, educators and publishers need to understand the mix of research methods used in their field to guide decision making, with a current concern being that qualitative research is threatened by big data. Although there have been many studies of the prevalence of different methods within individual narrow fields, there have been no systematic studies across academia. In response, this article assesses the prevalence and citation impact of academic research 1996-2019 that reports one of four common methods to gather qualitative data: interviews; focus groups; case studies; ethnography. The results show that, with minor exceptions, the prevalence of qualitative data has increased, often substantially, since 1996. In addition, all 27 broad fields (as classified by Scopus) now publish some qualitative research, with interviewing being by far the most common approach. This suggest that qualitative methods teaching and should increase, and researchers, editors and publishers should be increasingly open to the value that qualitative data can bring.
Continuous Descent Operations (CDO) involve smooth, idle-thrust descents that avoid level-offs, reducing fuel burn, emissions, and noise while improving efficiency and passenger comfort. Despite its operational and environmental benefits, limited research has systematically examined the factors influencing CDO performance. Moreover, many existing methods in related areas, such as trajectory optimization, lack the transparency required in aviation, where explainability is critical for safety and stakeholder trust. This study addresses these gaps by proposing a Fuzzy-Enhanced Explainable AI (FEXAI) framework that integrates fuzzy logic with machine learning and SHapley Additive exPlanations (SHAP) analysis. For this purpose, a comprehensive dataset of 29 features, including 11 operational and 18 weather-related features, was collected from 1,094 flights using Automatic Dependent Surveillance-Broadcast (ADS-B) data. Machine learning models and SHAP were then applied to classify flights' CDO adherence levels and rank features by importance. The three most influential features, as identified by SHAP scores, were then used to construct a fuzzy rule-based classifier, enabling the extraction of interpretable fuzzy rules. All models achieved classification accuracies above 90%, with FEXAI providing meaningful, human-readable rules for operational users. Results indicated that the average descent rate within the arrival route, the number of descent segments, and the average change in directional heading during descent were the strongest predictors of CDO performance. The FEXAI method proposed in this study presents a novel pathway for operational decision support and could be integrated into aviation tools to enable real-time advisories that maintain CDO adherence under varying operational conditions.
Recent years have seen big advances in the field of sentence-level quality estimation (QE), largely as a result of using neural-based architectures. However, the majority of these methods work only on the language pair they are trained on and need retraining for new language pairs. This process can prove difficult from a technical point of view and is usually computationally expensive. In this paper we propose a simple QE framework based on cross-lingual transformers, and we use it to implement and evaluate two different neural architectures. Our evaluation shows that the proposed methods achieve state-of-the-art results outperforming current open-source quality estimation frameworks when trained on datasets from WMT. In addition, the framework proves very useful in transfer learning settings, especially when dealing with low-resourced languages, allowing us to obtain very competitive results.
Purpose: Public attitudes towards COVID-19 and social distancing are critical in reducing its spread. It is therefore important to understand public reactions and information dissemination in all major forms, including on social media. This article investigates important issues reflected on Twitter in the early stages of the public reaction to COVID-19. Design/methodology/approach: A thematic analysis of the most retweeted English-language tweets mentioning COVID-19 during March 10-29, 2020. Findings: The main themes identified for the 87 qualifying tweets accounting for 14 million retweets were: lockdown life; attitude towards social restrictions; politics; safety messages; people with COVID-19; support for key workers; work; and COVID-19 facts/news. Research limitations/implications: Twitter played many positive roles, mainly through unofficial tweets. Users shared social distancing information, helped build support for social distancing, criticised government responses, expressed support for key workers, and helped each other cope with social isolation. A few popular tweets not supporting social distancing show that government messages sometimes failed. Practical implications: Public health campaigns in future may consider encouraging grass roots social web activity to support campaign goals. At a methodological level, analysing retweet counts emphasised politics and ignored practical implementation issues. Originality/value: This is the first qualitative analysis of general COVID-19-related retweeting.
Differences between the impacts of Open Access (OA) and non-OA research have been observed over a wide range of citation and altmetric indicators, usually finding an Open Access Advantage (OAA) within specific fields. However, science-wide analyses covering multiple years, indicators and disciplines are lacking. Using citation counts and six altmetrics for 38.7M articles published 2011-21, we compare OA and non-OA papers. The results show that there is no universal OAA across all disciplines or impact indicators: the OAA for citations tends to be lower for more recent papers, whereas the OAAs for news, blogs and Twitter are consistent across years and unrelated to volume of OA publications, whereas the OAAs for Wikipedia, patents and policy citations are more complex. These results support different hypotheses for different subjects and indicators. The evidence is consistent with OA accelerating research impact in the Medical & Health Sciences, Life Sciences and the Humanities; that increased visibility or discoverability is a factor in promoting the translation of research into socio-economic impact; and that OA is a factor in growing online engagement with research in some disciplines. OAAs are therefore complex, dynamic, multi-factorial and require considerable analysis to understand.
There are no more papers matching your filters at the moment.