alphaXiv

Sohar University

17 Oct 2025

ai-for-health computer-science computer-vision-and-pattern-recognition

Confidence-Weighted Semi-Supervised Learning for Skin Lesion Segmentation Using Hybrid CNN-Transformer Networks

Automated skin lesion segmentation through dermoscopic analysis is essential for early skin cancer detection, yet remains challenging due to limited annotated training data. We present MIRA-U, a semi-supervised framework that combines uncertainty-aware teacher-student pseudo-labeling with a hybrid CNN-Transformer architecture. Our approach employs a teacher network pre-trained via masked image modeling to generate confidence-weighted soft pseudo-labels, which guide a U-shaped CNN-Transformer student network featuring cross-attention skip connections. This design enhances pseudo-label quality and boundary delineation, surpassing reconstruction-based and CNN-only baselines, particularly in low-annotation regimes. Extensive evaluation on ISIC-2016 and PH2 datasets demonstrates superior performance, achieving a Dice Similarity Coefficient (DSC) of 0.9153 and Intersection over Union (IoU) of 0.8552 using only 50% labeled data. Code is publicly available on GitHub.

24 Jul 2024

computer-science human-computer-interaction

Functional near-infrared spectroscopy (fNIRS) and Eye tracking for Cognitive Load classification in a Driving Simulator Using Deep Learning

Deakin University Sohar University Swinburne University of Technonology

Motion simulators allow researchers to safely investigate the interaction of drivers with a vehicle. However, many studies that use driving simulator data to predict cognitive load only employ two levels of workload, leaving a gap in research on employing deep learning methodologies to analyze cognitive load, especially in challenging low-light conditions. Often, studies overlook or solely focus on scenarios in bright daylight. To address this gap and understand the correlation between performance and cognitive load, this study employs functional near-infrared spectroscopy (fNIRS) and eye-tracking data, including fixation duration and gaze direction, during simulated driving tasks in low visibility conditions, inducing various mental workloads. The first stage involves the statistical estimation of useful features from fNIRS and eye-tracking data. ANOVA will be applied to the signals to identify significant channels from fNIRS signals. Optimal features from fNIRS, eye-tracking and vehicle dynamics are then combined in one chunk as input to the CNN and LSTM model to predict workload variations. The proposed CNN-LSTM model achieved 99% accuracy with neurological data and 89% with vehicle dynamics to predict cognitive load, indicating potential for real-time assessment of driver mental state and guide designers for the development of safe adaptive systems.

02 Jul 2025

ai-for-health attention-mechanisms computer-science

ScaleFusionNet: Transformer-Guided Multi-Scale Feature Fusion for Skin Lesion Segmentation

KTH Royal Institute of Technology BGI Research Sohar University Taif University

Melanoma is a malignant tumor that originates from skin cell lesions. Accurate and efficient segmentation of skin lesions is essential for quantitative analysis but remains a challenge due to blurred lesion boundaries, gradual color changes, and irregular shapes. To address this, we propose ScaleFusionNet, a hybrid model that integrates a Cross-Attention Transformer Module (CATM) and adaptive fusion block (AFB) to enhance feature extraction and fusion by capturing both local and global features. We introduce CATM, which utilizes Swin transformer blocks and Cross Attention Fusion (CAF) to adaptively refine feature fusion and reduce semantic gaps in the encoder-decoder to improve segmentation accuracy. Additionally, the AFB uses Swin Transformer-based attention and deformable convolution-based adaptive feature extraction to help the model gather local and global contextual information through parallel pathways. This enhancement refines the lesion boundaries and preserves fine-grained details. ScaleFusionNet achieves Dice scores of 92.94\% and 91.80\% on the ISIC-2016 and ISIC-2018 datasets, respectively, demonstrating its effectiveness in skin lesion analysis. Simultaneously, independent validation experiments were conducted on the PH

^2

dataset using the pretrained model weights. The results show that ScaleFusionNet demonstrates significant performance improvements compared with other state-of-the-art methods. Our code implementation is publicly available at GitHub.

17 Oct 2025

ai-for-health attention-mechanisms computer-science

UNet with Self-Adaptive Mamba-Like Attention and Causal-Resonance Learning for Medical Image Segmentation

King Saud University University of Auckland

KTH Royal Institute of Technology Imam Mohammad Ibn Saud Islamic University Sohar University

Medical image segmentation plays an important role in various clinical applications; however, existing deep learning models face trade-offs between efficiency and accuracy. Convolutional Neural Networks (CNNs) capture local details well but miss the global context, whereas transformers handle the global context but at a high computational cost. Recently, State Space Sequence Models (SSMs) have shown potential for capturing long-range dependencies with linear complexity; however, their direct use in medical image segmentation remains limited due to incompatibility with image structures and autoregressive assumptions. To overcome these challenges, we propose SAMA-UNet, a novel U-shaped architecture that introduces two key innovations. First, the Self-Adaptive Mamba-like Aggregated Attention (SAMA) block adaptively integrates local and global features through dynamic attention weighting, enabling an efficient representation of complex anatomical patterns. Second, the causal resonance multi-scale module (CR-MSM) improves encoder-decoder interactions by adjusting feature resolution and causal dependencies across scales, enhancing the semantic alignment between low- and high-level features. Extensive experiments on MRI, CT, and endoscopy datasets demonstrate that SAMA-UNet consistently outperforms CNN, Transformer, and Mamba-based methods. It achieves 85.38% DSC and 87.82% NSD on BTCV, 92.16% and 96.54% on ACDC, 67.14% and 68.70% on EndoVis17, and 84.06% and 88.47% on ATLAS23, establishing new benchmarks across modalities. These results confirm the effectiveness of SAMA-UNet in combining efficiency and accuracy, making it a promising solution for real-world clinical segmentation tasks. The source code is available on GitHub.

20 Jan 2025

signal-processing electrical-engineering

Energy Consumption Reduction for UAV Trajectory Training : A Transfer Learning Approach

University of York Interdisciplinary Centre for Security, Reliability, and Trust (SnT)Sohar University

The advent of 6G technology demands flexible, scalable wireless architectures to support ultra-low latency, high connectivity, and high device density. The Open Radio Access Network (O-RAN) framework, with its open interfaces and virtualized functions, provides a promising foundation for such architectures. However, traditional fixed base stations alone are not sufficient to fully capitalize on the benefits of O-RAN due to their limited flexibility in responding to dynamic network demands. The integration of Unmanned Aerial Vehicles (UAVs) as mobile RUs within the O-RAN architecture offers a solution by leveraging the flexibility of drones to dynamically extend coverage. However, UAV operating in diverse environments requires frequent retraining, leading to significant energy waste. We proposed transfer learning based on Dueling Double Deep Q network (DDQN) with multi-step learning, which significantly reduces the training time and energy consumption required for UAVs to adapt to new environments. We designed simulation environments and conducted ray tracing experiments using Wireless InSite with real-world map data. In the two simulated environments, training energy consumption was reduced by 30.52% and 58.51%, respectively. Furthermore, tests on real-world maps of Ottawa and Rosslyn showed energy reductions of 44.85% and 36.97%, respectively.

03 Jun 2024

signal-processing electrical-engineering

Noise And Artifacts Elimination In ECG Signals Using Wavelet, Variational Mode Decomposition And Nonlocal Means Algorithm

Sohar University Dhofar University Lifeline hospital

Electrocardiogram (ECG) signals can frequently be affected by the introduction of noise and artifacts. Since these types of signal corruptions disrupt the accurate interpretation of ECG signals, noise and artifacts must be eliminated during the preprocessing phase. In this paper, we introduced a comprehensive pre-processing phase that eliminates motion artifacts and noise prior to detecting and extracting entirely corrupted ECG signal segments. The first method, denoted as the WLNH method, is constructed using wavelet multiresolution analysis (MRA), the Lillifors test, NLM, and a high-pass filter. The second method entails substituting the wavelet MRA decomposition with the variational mode decomposition (VMD) while retaining all other stages from the first method. This technique is denoted as the VLWNH. The two proposed methods differ from some existing methods in that they first employ the Lilliefors test to identify whether a component is white Gaussian noise and then utilize the High Pass Filter to eliminate motion anomalies. The simulation results show that the offered solutions are effective, particularly when dealing with white Gaussian noise and base-line wander (BW) noise.

09 Apr 2025

computer-science artificial-intelligence robotics

Learning-Based Approximate Nonlinear Model Predictive Control Motion Cueing

Deakin University Swinburne University Sohar University Institute for Intelligent Systems Research and Innovation

Motion Cueing Algorithms (MCAs) encode the movement of simulated vehicles into movement that can be reproduced with a motion simulator to provide a realistic driving experience within the capabilities of the machine. This paper introduces a novel learning-based MCA for serial robot-based motion simulators. Building on the differentiable predictive control framework, the proposed method merges the advantages of Nonlinear Model Predictive Control (NMPC) - notably nonlinear constraint handling and accurate kinematic modeling - with the computational efficiency of machine learning. By shifting the computational burden to offline training, the new algorithm enables real-time operation at high control rates, thus overcoming the key challenge associated with NMPC-based motion cueing. The proposed MCA incorporates a nonlinear joint-space plant model and a policy network trained to mimic NMPC behavior while accounting for joint acceleration, velocity, and position limits. Simulation experiments across multiple motion cueing scenarios showed that the proposed algorithm performed on par with a state-of-the-art NMPC-based alternative in terms of motion cueing quality as quantified by the RMSE and correlation coefficient with respect to reference signals. However, the proposed algorithm was on average 400 times faster than the NMPC baseline. In addition, the algorithm successfully generalized to unseen operating conditions, including motion cueing scenarios on a different vehicle and real-time physics-based simulations.

02 Jun 2024

ai-for-health computer-science computer-vision-security

An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging

Hamad Bin Khalifa University Sohar University Foundation University School of Sciences and Technology

Recent developments in multimodal large language models (MLLMs) have spurred significant interest in their potential applications across various medical imaging domains. On the one hand, there is a temptation to use these generative models to synthesize realistic-looking medical image data, while on the other hand, the ability to identify synthetic image data in a pool of data is also significantly important. In this study, we explore the potential of the Gemini (\textit{gemini-1.0-pro-vision-latest}) and GPT-4V (gpt-4-vision-preview) models for medical image analysis using two modalities of medical image data. Utilizing synthetic and real imaging data, both Gemini AI and GPT-4V are first used to classify real versus synthetic images, followed by an interpretation and analysis of the input images. Experimental results demonstrate that both Gemini and GPT-4 could perform some interpretation of the input images. In this specific experiment, Gemini was able to perform slightly better than the GPT-4V on the classification task. In contrast, responses associated with GPT-4V were mostly generic in nature. Our early investigation presented in this work provides insights into the potential of MLLMs to assist with the classification and interpretation of retinal fundoscopy and lung X-ray images. We also identify key limitations associated with the early investigation study on MLLMs for specialized tasks in medical image analysis.

22 Jul 2024

ai-for-health computer-science artificial-intelligence

Enhancing Cognitive Workload Classification Using Integrated LSTM Layers and CNNs for fNIRS Data Analysis

Deakin University Swinburne University of Technology Sohar University Tabriz Islamic Art University

Functional near-infrared spectroscopy (fNIRS) is employed as a non-invasive method to monitor functional brain activation by capturing changes in the concentrations of oxygenated haemoglobin (HbO) and deoxygenated haemo-globin (HbR). Various machine learning classification techniques have been utilized to distinguish cognitive states. However, conventional machine learning methods, although simpler to implement, undergo a complex pre-processing phase before network training and demonstrate reduced accuracy due to inadequate data preprocessing. Additionally, previous research in cog-nitive load assessment using fNIRS has predominantly focused on differ-sizeentiating between two levels of mental workload. These studies mainly aim to classify low and high levels of cognitive load or distinguish between easy and difficult tasks. To address these limitations associated with conven-tional methods, this paper conducts a comprehensive exploration of the im-pact of Long Short-Term Memory (LSTM) layers on the effectiveness of Convolutional Neural Networks (CNNs) within deep learning models. This is to address the issues related to spatial features overfitting and lack of tem-poral dependencies in CNN in the previous studies. By integrating LSTM layers, the model can capture temporal dependencies in the fNIRS data, al-lowing for a more comprehensive understanding of cognitive states. The primary objective is to assess how incorporating LSTM layers enhances the performance of CNNs. The experimental results presented in this paper demonstrate that the integration of LSTM layers with Convolutional layers results in an increase in the accuracy of deep learning models from 97.40% to 97.92%.

24 Jul 2024

computer-science human-computer-interaction machine-learning

Predicting cognitive load in immersive driving scenarios with a hybrid CNN-RNN model

Deakin University Swinburne University of Technology Sohar University

One debatable issue in traffic safety research is that cognitive load from sec-ondary tasks reduces primary task performance, such as driving. Although physiological signals have been extensively used in driving-related research to assess cognitive load, only a few studies have specifically focused on high cognitive load scenarios. Most existing studies tend to examine moderate or low levels of cognitive load In this study, we adopted an auditory version of the n-back task of three levels as a cognitively loading secondary task while driving in a driving simulator. During the simultaneous execution of driving and the n-back task, we recorded fNIRS, eye-tracking, and driving behavior data to predict cognitive load at three different levels. To the best of our knowledge, this combination of data sources has never been used before. Un-like most previous studies that utilize binary classification of cognitive load and driving in conditions without traffic, our study involved three levels of cognitive load, with drivers operating in normal traffic conditions under low visibility, specifically during nighttime and rainy weather. We proposed a hybrid neural network combining a 1D Convolutional Neural Network and a Recurrent Neural Network to predict cognitive load. Our experimental re-sults demonstrate that the proposed model, with fewer parameters, increases accuracy from 99.82% to 99.99% using physiological data, and from 87.26% to 92.02% using driving behavior data alone. This significant improvement highlights the effectiveness of our hybrid neural network in accurately pre-dicting cognitive load during driving under challenging conditions.

09 Apr 2025

autonomous-vehicles computer-science artificial-intelligence

Learning-Based Approximate Nonlinear Model Predictive Control Motion Cueing

Deakin University Swinburne University Sohar University

27 Jun 2021

attention-mechanisms clustering-algorithms computer-science

Transfer-based adaptive tree for multimodal sentiment analysis based on user latent aspects

University of Queensland Iran University of Science and Technology Sohar University

Multimodal sentiment analysis benefits various applications such as human-computer interaction and recommendation systems. It aims to infer the users' bipolar ideas using visual, textual, and acoustic signals. Although researchers affirm the association between cognitive cues and emotional manifestations, most of the current multimodal approaches in sentiment analysis disregard user-specific aspects. To tackle this issue, we devise a novel method to perform multimodal sentiment prediction using cognitive cues, such as personality. Our framework constructs an adaptive tree by hierarchically dividing users and trains the LSTM-based submodels, utilizing an attention-based fusion to transfer cognitive-oriented knowledge within the tree. Subsequently, the framework consumes the conclusive agglomerative knowledge from the adaptive tree to predict final sentiments. We also devise a dynamic dropout method to facilitate data sharing between neighboring nodes, reducing data sparsity. The empirical results on real-world datasets determine that our proposed model for sentiment prediction can surpass trending rivals. Moreover, compared to other ensemble approaches, the proposed transfer-based algorithm can better utilize the latent cognitive cues and foster the prediction outcomes. Based on the given extrinsic and intrinsic analysis results, we note that compared to other theoretical-based techniques, the proposed hierarchical clustering approach can better group the users within the adaptive tree.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Confidence-Weighted Semi-Supervised Learning for Skin Lesion Segmentation Using Hybrid CNN-Transformer Networks

Functional near-infrared spectroscopy (fNIRS) and Eye tracking for Cognitive Load classification in a Driving Simulator Using Deep Learning

ScaleFusionNet: Transformer-Guided Multi-Scale Feature Fusion for Skin Lesion Segmentation

UNet with Self-Adaptive Mamba-Like Attention and Causal-Resonance Learning for Medical Image Segmentation

Energy Consumption Reduction for UAV Trajectory Training : A Transfer Learning Approach

Noise And Artifacts Elimination In ECG Signals Using Wavelet, Variational Mode Decomposition And Nonlocal Means Algorithm

Learning-Based Approximate Nonlinear Model Predictive Control Motion Cueing

An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging

Enhancing Cognitive Workload Classification Using Integrated LSTM Layers and CNNs for fNIRS Data Analysis

Predicting cognitive load in immersive driving scenarios with a hybrid CNN-RNN model

Learning-Based Approximate Nonlinear Model Predictive Control Motion Cueing

Transfer-based adaptive tree for multimodal sentiment analysis based on user latent aspects

Events

AI for Law

Personalize Your Feed