Edinburgh Napier University
Span annotation is the task of localizing and classifying text spans according to custom guidelines. Annotated spans can be used to analyze and evaluate high-quality texts for which single-score metrics fail to provide actionable feedback. Until recently, span annotation was limited to human annotators or fine-tuned models. In this study, we show that large language models (LLMs) can serve as flexible and cost-effective span annotation backbones. To demonstrate their utility, we compare LLMs to skilled human annotators on three diverse span annotation tasks: evaluating data-to-text generation, identifying translation errors, and detecting propaganda techniques. We demonstrate that LLMs achieve inter-annotator agreement (IAA) comparable to human annotators at a fraction of a cost per output annotation. We also manually analyze model outputs, finding that LLMs make errors at a similar rate to human annotators. We release the dataset of more than 40k model and human annotations for further research.
The ECDSA (Elliptic Curve Digital Signature Algorithm) is used in many blockchain networks for digital signatures. This includes the Bitcoin and the Ethereum blockchains. While it has good performance levels and as strong current security, it should be handled with care. This care typically relates to the usage of the nonce value which is used to create the signature. This paper outlines the methods that can be used to break ECDSA signatures, including revealed nonces, weak nonce choice, nonce reuse, two keys and shared nonces, and fault attack.
Query Performance Prediction (QPP) estimates retrieval systems effectiveness for a given query, offering valuable insights for search effectiveness and query processing. Despite extensive research, QPPs face critical challenges in generalizing across diverse retrieval paradigms and collections. This paper provides a comprehensive evaluation of state-of-the-art QPPs (e.g. NQC, UQC), LETOR-based features, and newly explored dense-based predictors. Using diverse sparse rankers (BM25, DFree without and with query expansion) and hybrid or dense (SPLADE and ColBert) rankers and diverse test collections ROBUST, GOV2, WT10G, and MS MARCO; we investigate the relationships between predicted and actual performance, with a focus on generalization and robustness. Results show significant variability in predictors accuracy, with collections as the main factor and rankers next. Some sparse predictors perform somehow on some collections (TREC ROBUST and GOV2) but do not generalise to other collections (WT10G and MS-MARCO). While some predictors show promise in specific scenarios, their overall limitations constrain their utility for applications. We show that QPP-driven selective query processing offers only marginal gains, emphasizing the need for improved predictors that generalize across collections, align with dense retrieval architectures and are useful for downstream applications.
Critical National Infrastructure (CNI) encompasses a nation's essential assets that are fundamental to the operation of society and the economy, ensuring the provision of vital utilities such as energy, water, transportation, and communication. Nevertheless, growing cybersecurity threats targeting these infrastructures can potentially interfere with operations and seriously risk national security and public safety. In this paper, we examine the intricate issues raised by cybersecurity risks to vital infrastructure, highlighting these systems' vulnerability to different types of cyberattacks. We analyse the significance of trust, privacy, and resilience for Critical Infrastructure Protection (CIP), examining the diverse standards and regulations to manage these domains. We also scrutinise the co-analysis of safety and security, offering innovative approaches for their integration and emphasising the interdependence between these fields. Furthermore, we introduce a comprehensive method for CIP leveraging Generative AI and Large Language Models (LLMs), giving a tailored lifecycle and discussing specific applications across different critical infrastructure sectors. Lastly, we discuss potential future directions that promise to enhance the security and resilience of critical infrastructures. This paper proposes innovative strategies for CIP from evolving attacks and enhances comprehension of cybersecurity concerns related to critical infrastructure.
Recent studies have increasingly acknowledged the advantages of incorporating visual data into speech enhancement (SE) systems. In this paper, we introduce a novel audio-visual SE approach, termed DCUC-Net (deep complex U-Net with conformer network). The proposed DCUC-Net leverages complex domain features and a stack of conformer blocks. The encoder and decoder of DCUC-Net are designed using a complex U-Net-based framework. The audio and visual signals are processed using a complex encoder and a ResNet-18 model, respectively. These processed signals are then fused using the conformer blocks and transformed into enhanced speech waveforms via a complex decoder. The conformer blocks consist of a combination of self-attention mechanisms and convolutional operations, enabling DCUC-Net to effectively capture both global and local audio-visual dependencies. Our experimental results demonstrate the effectiveness of DCUC-Net, as it outperforms the baseline model from the COG-MHEAR AVSE Challenge 2023 by a notable margin of 0.14 in terms of PESQ. Additionally, the proposed DCUC-Net performs comparably to a state-of-the-art model and outperforms all other compared models on the Taiwan Mandarin speech with video (TMSV) dataset.
The security of the Elliptic Curve Digital Signature Algorithm (ECDSA) depends on the uniqueness and secrecy of the nonce, which is used in each signature. While it is well understood that nonce kk reuse across two distinct messages can leak the private key, we show that even if a distinct value is used for k2k_2, where an affine relationship exists in the form of: \(k_m = a \cdot k_n + b\), we can also recover the private key. Our method requires only two signatures (even over the same message) and relies purely on algebra, with no need for lattice reduction or brute-force search(if the relationship, or offset, is known). To our knowledge, this is the first closed-form derivation of the ECDSA private key from only two signatures over the same message, under a known affine relationship between nonces.
Centuries-old Islamic medical texts like Avicenna's Canon of Medicine and the Prophetic Tibb-e-Nabawi encode a wealth of preventive care, nutrition, and holistic therapies, yet remain inaccessible to many and underutilized in modern AI systems. Existing language-model benchmarks focus narrowly on factual recall or user preference, leaving a gap in validating culturally grounded medical guidance at scale. We propose a unified evaluation pipeline, Tibbe-AG, that aligns 30 carefully curated Prophetic-medicine questions with human-verified remedies and compares three LLMs (LLaMA-3, Mistral-7B, Qwen2-7B) under three configurations: direct generation, retrieval-augmented generation, and a scientific self-critique filter. Each answer is then assessed by a secondary LLM serving as an agentic judge, yielding a single 3C3H quality score. Retrieval improves factual accuracy by 13%, while the agentic prompt adds another 10% improvement through deeper mechanistic insight and safety considerations. Our results demonstrate that blending classical Islamic texts with retrieval and self-evaluation enables reliable, culturally sensitive medical question-answering.
Digital twin (DT) technology is rapidly becoming essential for smart city ecosystems, enabling real-time synchronisation and autonomous decision-making across physical and digital domains. However, as DTs take active roles in control loops, securely binding them to their physical counterparts in dynamic and adversarial environments remains a significant challenge. Existing authentication solutions either rely on static trust models, require centralised authorities, or fail to provide live and verifiable physical-digital binding, making them unsuitable for latency-sensitive and distributed deployments. To address this gap, we introduce PRZK-Bind, a lightweight and decentralised authentication protocol that combines Schnorr-based zero-knowledge proofs with elliptic curve cryptography to establish secure, real-time correspondence between physical entities and DTs without relying on pre-shared secrets. Simulation results show that PRZK-Bind significantly improves performance, offering up to 4.5 times lower latency and 4 times reduced energy consumption compared to cryptography-heavy baselines, while maintaining false acceptance rates more than 10 times lower. These findings highlight its suitability for future smart city deployments requiring efficient, resilient, and trustworthy DT authentication.
End-to-End (E2E) learning-based concept has been recently introduced to jointly optimize both the transmitter and the receiver in wireless communication systems. Unfortunately, this E2E learning architecture requires a prior differentiable channel model to jointly train the deep neural networks (DNNs) at the transceivers, which is hardly obtained in practice. This paper aims to solve this issue by developing a deep deterministic policy gradient (DDPG)-based framework. In particular, the proposed solution uses the loss value of the receiver DNN as the reward to train the transmitter DNN. The simulation results then show that our proposed solution can jointly train the transmitter and the receiver without requiring the prior channel model. In addition, we demonstrate that the proposed DDPG-based solution can achieve better detection performance compared to the state-of-the-art solutions.
Although the emergence of 6G IoT networks has accelerated the deployment of enhanced smart city services, the resource limitations of IoT devices remain as a significant problem. Given this limitation, meeting the low-latency service requirement of 6G networks becomes even more challenging. However, existing 6G IoT management strategies lack real-time operation and mostly rely on discrete actions, which are insufficient to optimise energy consumption. To address these, in this study, we propose a Digital Twin (DT)-guided energy management framework to jointly handle the low latency and energy efficiency challenges in 6G IoT networks. In this framework, we provide the twin models through a distributed overlay network and handle the dynamic updates between the data layer and the upper layers of the DT over the Real-Time Publish Subscribe (RTPS) protocol. We also design a Reinforcement Learning (RL) engine with a novel formulated reward function to provide optimal data update times for each of the IoT devices. The RL engine receives a diverse set of environment states from the What-if engine and runs Deep Deterministic Policy Gradient (DDPG) to output continuous actions to the IoT devices. Based on our simulation results, we observe that the proposed framework achieves a 37% improvement in 95th percentile latency and a 30% reduction in energy consumption compared to the existing literature.
Task2Dial introduces a dataset and task for commonsense-enhanced task-based dialogue grounded in documents, capturing real-world instruction-giving scenarios that require knowledge beyond explicit document text. The dataset features longer dialogues and turns, higher lexical richness, and integrates non-document-grounded commonsense questions.
Coupling Large Language Models (LLMs) with Evolutionary Algorithms has recently shown significant promise as a technique to design new heuristics that outperform existing methods, particularly in the field of combinatorial optimisation. An escalating arms race is both rapidly producing new heuristics and improving the efficiency of the processes evolving them. However, driven by the desire to quickly demonstrate the superiority of new approaches, evaluation of the new heuristics produced for a specific domain is often cursory: testing on very few datasets in which instances all belong to a specific class from the domain, and on few instances per class. Taking bin-packing as an example, to the best of our knowledge we conduct the first rigorous benchmarking study of new LLM-generated heuristics, comparing them to well-known existing heuristics across a large suite of benchmark instances using three performance metrics. For each heuristic, we then evolve new instances won by the heuristic and perform an instance space analysis to understand where in the feature space each heuristic performs well. We show that most of the LLM heuristics do not generalise well when evaluated across a broad range of benchmarks in contrast to existing simple heuristics, and suggest that any gains from generating very specialist heuristics that only work in small areas of the instance space need to be weighed carefully against the considerable cost of generating these heuristics.
We propose a novel technique for algorithm-selection, applicable to optimisation domains in which there is implicit sequential information encapsulated in the data, e.g., in online bin-packing. Specifically we train two types of recurrent neural networks to predict a packing heuristic in online bin-packing, selecting from four well-known heuristics. As input, the RNN methods only use the sequence of item-sizes. This contrasts to typical approaches to algorithm-selection which require a model to be trained using domain-specific instance features that need to be first derived from the input data. The RNN approaches are shown to be capable of achieving within 5% of the oracle performance on between 80.88% to 97.63% of the instances, depending on the dataset. They are also shown to outperform classical machine learning models trained using derived features. Finally, we hypothesise that the proposed methods perform well when the instances exhibit some implicit structure that results in discriminatory performance with respect to a set of heuristics. We test this hypothesis by generating fourteen new datasets with increasing levels of structure, and show that there is a critical threshold of structure required before algorithm-selection delivers benefit.
This study presents a comprehensive comparative evaluation of four state-of-the-art Large Language Models (LLMs)--Claude 3.7 Sonnet, DeepSeek-V3, Gemini 2.0 Flash, and GPT-4o--for sentiment analysis and emotion detection in Persian social media texts. Comparative analysis among LLMs has witnessed a significant rise in recent years, however, most of these analyses have been conducted on English language tasks, creating gaps in understanding cross-linguistic performance patterns. This research addresses these gaps through rigorous experimental design using balanced Persian datasets containing 900 texts for sentiment analysis (positive, negative, neutral) and 1,800 texts for emotion detection (anger, fear, happiness, hate, sadness, surprise). The main focus was to allow for a direct and fair comparison among different models, by using consistent prompts, uniform processing parameters, and by analyzing the performance metrics such as precision, recall, F1-scores, along with misclassification patterns. The results show that all models reach an acceptable level of performance, and a statistical comparison of the best three models indicates no significant differences among them. However, GPT-4o demonstrated a marginally higher raw accuracy value for both tasks, while Gemini 2.0 Flash proved to be the most cost-efficient. The findings indicate that the emotion detection task is more challenging for all models compared to the sentiment analysis task, and the misclassification patterns can represent some challenges in Persian language texts. These findings establish performance benchmarks for Persian NLP applications and offer practical guidance for model selection based on accuracy, efficiency, and cost considerations, while revealing cultural and linguistic challenges that require consideration in multilingual AI system deployment.
A filter bubble refers to the phenomenon where Internet customization effectively isolates individuals from diverse opinions or materials, resulting in their exposure to only a select set of content. This can lead to the reinforcement of existing attitudes, beliefs, or conditions. In this study, our primary focus is to investigate the impact of filter bubbles in recommender systems. This pioneering research aims to uncover the reasons behind this problem, explore potential solutions, and propose an integrated tool to help users avoid filter bubbles in recommender systems. To achieve this objective, we conduct a systematic literature review on the topic of filter bubbles in recommender systems. The reviewed articles are carefully analyzed and classified, providing valuable insights that inform the development of an integrated approach. Notably, our review reveals evidence of filter bubbles in recommendation systems, highlighting several biases that contribute to their existence. Moreover, we propose mechanisms to mitigate the impact of filter bubbles and demonstrate that incorporating diversity into recommendations can potentially help alleviate this issue. The findings of this timely review will serve as a benchmark for researchers working in interdisciplinary fields such as privacy, artificial intelligence ethics, and recommendation systems. Furthermore, it will open new avenues for future research in related domains, prompting further exploration and advancement in this critical area.
This research develops a weighted ensemble machine learning model that addresses the issue of false fire alarms by distinguishing between genuine fire conditions and non-fire situations. The model achieves 99.98% accuracy with only 1 false positive and 2 false negatives out of 17,903 test cases on a public smoke detection dataset, significantly enhancing fire safety system reliability.
The ever-increasing security vulnerabilities in the Internet-of-Things (IoT) systems require improved threat detection approaches. This paper presents a compact and efficient approach to detect botnet attacks by employing an integrated approach that consists of traffic pattern analysis, temporal support learning, and focused feature extraction. The proposed attention-based model benefits from a hybrid CNN-BiLSTM architecture and achieves 99% classification accuracy in detecting botnet attacks utilizing the N-BaIoT dataset, while maintaining high precision and recall across various scenarios. The proposed model's performance is further validated by key parameters, such as Mathews Correlation Coefficient and Cohen's kappa Correlation Coefficient. The close-to-ideal results for these parameters demonstrate the proposed model's ability to detect botnet attacks accurately and efficiently in practical settings and on unseen data. The proposed model proved to be a powerful defence mechanism for IoT networks to face emerging security challenges.
The actions of intelligent agents, such as chatbots, recommender systems, and virtual assistants are typically not fully transparent to the user. Consequently, using such an agent involves the user exposing themselves to the risk that the agent may act in a way opposed to the user's goals. It is often argued that people use trust as a cognitive shortcut to reduce the complexity of such interactions. Here we formalise this by using the methods of evolutionary game theory to study the viability of trust-based strategies in repeated games. These are reciprocal strategies that cooperate as long as the other player is observed to be cooperating. Unlike classic reciprocal strategies, once mutual cooperation has been observed for a threshold number of rounds they stop checking their co-player's behaviour every round, and instead only check with some probability. By doing so, they reduce the opportunity cost of verifying whether the action of their co-player was actually cooperative. We demonstrate that these trust-based strategies can outcompete strategies that are always conditional, such as Tit-for-Tat, when the opportunity cost is non-negligible. We argue that this cost is likely to be greater when the interaction is between people and intelligent agents, because of the reduced transparency of the agent. Consequently, we expect people to use trust-based strategies more frequently in interactions with intelligent agents. Our results provide new, important insights into the design of mechanisms for facilitating interactions between humans and intelligent agents, where trust is an essential factor.
The Covid-19 pandemic presented an unprecedented global public health emergency, and concomitantly an unparalleled opportunity to investigate public responses to adverse social conditions. The widespread ability to post messages to social media platforms provided an invaluable outlet for such an outpouring of public sentiment, including not only expressions of social solidarity, but also the spread of misinformation and misconceptions around the effect and potential risks of the pandemic. This archive of message content therefore represents a key resource in understanding public responses to health crises, analysis of which could help to inform public policy interventions to better respond to similar events in future. We present a benchmark database of public social media postings from the United Kingdom related to the Covid-19 pandemic for academic research purposes, along with some initial analysis, including a taxonomy of key themes organised by keyword. This release supports the findings of a research study funded by the Scottish Government Chief Scientists' Office that aims to investigate social sentiment in order to understand the response to public health measures implemented during the pandemic.
Wireless networks are vulnerable to jamming attacks due to the shared communication medium, which can severely degrade performance and disrupt services. Despite extensive research, current jamming detection methods often rely on simulated data or proprietary over-the-air datasets with limited cross-layer features, failing to accurately represent the real state of a network and thus limiting their effectiveness in real-world scenarios. To address these challenges, we introduce JamShield, a dynamic jamming detection system trained on our own collected over-the-air and publicly available dataset. It utilizes hybrid feature selection to prioritize relevant features for accurate and efficient detection. Additionally, it includes an auto-classification module that dynamically adjusts the classification algorithm in real-time based on current network conditions. Our experimental results demonstrate significant improvements in detection rate, precision, and recall, along with reduced false alarms and misdetections compared to state-of-the-art detection algorithms, making JamShield a robust and reliable solution for detecting jamming attacks in real-world wireless networks.
There are no more papers matching your filters at the moment.