alphaXiv

History

Papers Benchmarks

James Cook University

09 Oct 2025

ai-for-health attention-mechanisms computer-science

Demystifying Deep Learning-based Brain Tumor Segmentation with 3D UNets and Explainable AI (XAI): A Comparative Analysis

Georgia Institute of Technology Xiamen University Malaysia James Cook University

An investigation by researchers at Xiamen University Malaysia focused on evaluating various 3D UNet architectures for brain tumor segmentation from MRI images, integrating Explainable AI (XAI) to enhance model interpretability. The study identified 3D Residual UNet (ResUNet) as the most effective model, achieving superior segmentation accuracy with the fastest inference time of 8.45 seconds per volume on the test dataset.

09 May 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

Computationally Efficient Diffusion Models in Medical Imaging: A Comprehensive Review

James Cook University

The diffusion model has recently emerged as a potent approach in computer vision, demonstrating remarkable performances in the field of generative artificial intelligence. Capable of producing high-quality synthetic images, diffusion models have been successfully applied across a range of applications. However, a significant challenge remains with the high computational cost associated with training and generating these models. This study focuses on the efficiency and inference time of diffusion-based generative models, highlighting their applications in both natural and medical imaging. We present the most recent advances in diffusion models by categorizing them into three key models: the Denoising Diffusion Probabilistic Model (DDPM), the Latent Diffusion Model (LDM), and the Wavelet Diffusion Model (WDM). These models play a crucial role in medical imaging, where producing fast, reliable, and high-quality medical images is essential for accurate analysis of abnormalities and disease diagnosis. We first investigate the general framework of DDPM, LDM, and WDM and discuss the computational complexity gap filled by these models in natural and medical imaging. We then discuss the current limitations of these models as well as the opportunities and future research directions in medical imaging.

06 Nov 2020

computer-science computer-vision-and-pattern-recognition image-segmentation

Affinity LCFCN: Learning to Segment Fish with Weak Supervision

McGill University James Cook University Element AI

Aquaculture industries rely on the availability of accurate fish body measurements, e.g., length, width and mass. Manual methods that rely on physical tools like rulers are time and labour intensive. Leading automatic approaches rely on fully-supervised segmentation models to acquire these measurements but these require collecting per-pixel labels -- also time consuming and laborious: i.e., it can take up to two minutes per fish to generate accurate segmentation labels, almost always requiring at least some manual intervention. We propose an automatic segmentation model efficiently trained on images labeled with only point-level supervision, where each fish is annotated with a single click. This labeling process requires significantly less manual intervention, averaging roughly one second per fish. Our approach uses a fully convolutional neural network with one branch that outputs per-pixel scores and another that outputs an affinity matrix. We aggregate these two outputs using a random walk to obtain the final, refined per-pixel segmentation output. We train the entire model end-to-end with an LCFCN loss, resulting in our A-LCFCN method. We validate our model on the DeepFish dataset, which contains many fish habitats from the north-eastern Australian region. Our experimental results confirm that A-LCFCN outperforms a fully-supervised segmentation model at fixed annotation budget. Moreover, we show that A-LCFCN achieves better segmentation results than LCFCN and a standard baseline. We have released the code at \url{this https URL}.

19 Sep 2025

autonomous-vehicles computer-science computer-vision-and-pattern-recognition

RadarGaussianDet3D: An Efficient and Effective Gaussian-based 3D Detector with 4D Automotive Radars

Beihang University James Cook University

4D automotive radars have gained increasing attention for autonomous driving due to their low cost, robustness, and inherent velocity measurement capability. However, existing 4D radar-based 3D detectors rely heavily on pillar encoders for BEV feature extraction, where each point contributes to only a single BEV grid, resulting in sparse feature maps and degraded representation quality. In addition, they also optimize bounding box attributes independently, leading to sub-optimal detection accuracy. Moreover, their inference speed, while sufficient for high-end GPUs, may fail to meet the real-time requirement on vehicle-mounted embedded devices. To overcome these limitations, an efficient and effective Gaussian-based 3D detector, namely RadarGaussianDet3D is introduced, leveraging Gaussian primitives and distributions as intermediate representations for radar points and bounding boxes. In RadarGaussianDet3D, a novel Point Gaussian Encoder (PGE) is designed to transform each point into a Gaussian primitive after feature aggregation and employs the 3D Gaussian Splatting (3DGS) technique for BEV rasterization, yielding denser feature maps. PGE exhibits exceptionally low latency, owing to the optimized algorithm for point feature aggregation and fast rendering of 3DGS. In addition, a new Box Gaussian Loss (BGL) is proposed, which converts bounding boxes into 3D Gaussian distributions and measures their distance to enable more comprehensive and consistent optimization. Extensive experiments on TJ4DRadSet and View-of-Delft demonstrate that RadarGaussianDet3D achieves state-of-the-art detection accuracy while delivering substantially faster inference, highlighting its potential for real-time deployment in autonomous driving.

257

02 Sep 2025

autonomous-vehicles computer-science computer-vision-security

Vehicle-to-Everything Cooperative Perception for Autonomous Driving

University of Alabama in Huntsville

Chalmers University of Technology Swinburne University of Technology Institute for Infocomm Research, Agency for Science, Technology, and Research James Cook University Vitalent Consulting

Achieving fully autonomous driving with enhanced safety and efficiency relies on vehicle-to-everything cooperative perception, which enables vehicles to share perception data, thereby enhancing situational awareness and overcoming the limitations of the sensing ability of individual vehicles. Vehicle-to-everything cooperative perception plays a crucial role in extending the perception range, increasing detection accuracy, and supporting more robust decision-making and control in complex environments. This paper provides a comprehensive survey of recent developments in vehicle-to-everything cooperative perception, introducing mathematical models that characterize the perception process under different collaboration strategies. Key techniques for enabling reliable perception sharing, such as agent selection, data alignment, and feature fusion, are examined in detail. In addition, major challenges are discussed, including differences in agents and models, uncertainty in perception outputs, and the impact of communication constraints such as transmission delay and data loss. The paper concludes by outlining promising research directions, including privacy-preserving artificial intelligence methods, collaborative intelligence, and integrated sensing frameworks to support future advancements in vehicle-to-everything cooperative perception.

01 Dec 2025

agent-based-systems computer-science computer-vision-and-pattern-recognition

AerialMind: Towards Referring Multi-Object Tracking in UAV Scenarios

Fudan University Purple Mountain Laboratories

Shanghai Jiao Tong University The Hong Kong University of Science and Technology (Guangzhou)Swinburne University of Technology China University of Petroleum (East China)James Cook University Shandong Key Laboratory of Intelligent Oil & Gas Industrial Software Xi JiaoTong-Liverpool University Xi as’an Jiaotong-Liverpool University

Referring Multi-Object Tracking (RMOT) aims to achieve precise object detection and tracking through natural language instructions, representing a fundamental capability for intelligent robotic systems. However, current RMOT research remains mostly confined to ground-level scenarios, which constrains their ability to capture broad-scale scene contexts and perform comprehensive tracking and path planning. In contrast, Unmanned Aerial Vehicles (UAVs) leverage their expansive aerial perspectives and superior maneuverability to enable wide-area surveillance. Moreover, UAVs have emerged as critical platforms for Embodied Intelligence, which has given rise to an unprecedented demand for intelligent aerial systems capable of natural language interaction. To this end, we introduce AerialMind, the first large-scale RMOT benchmark in UAV scenarios, which aims to bridge this research gap. To facilitate its construction, we develop an innovative semi-automated collaborative agent-based labeling assistant (COALA) framework that significantly reduces labor costs while maintaining annotation quality. Furthermore, we propose HawkEyeTrack (HETrack), a novel method that collaboratively enhances vision-language representation learning and improves the perception of UAV scenarios. Comprehensive experiments validated the challenging nature of our dataset and the effectiveness of our method.

14 Feb 2019

computer-science computer-vision-and-pattern-recognition machine-learning

DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning

James Cook University

Robotic weed control has seen increased research of late with its potential for boosting productivity in agriculture. Majority of works focus on developing robotics for croplands, ignoring the weed management problems facing rangeland stock farmers. Perhaps the greatest obstacle to widespread uptake of robotic weed control is the robust classification of weed species in their natural environment. The unparalleled successes of deep learning make it an ideal candidate for recognising various weed species in the complex rangeland environment. This work contributes the first large, public, multiclass image dataset of weed species from the Australian rangelands; allowing for the development of robust classification methods to make robotic weed control viable. The DeepWeeds dataset consists of 17,509 labelled images of eight nationally significant weed species native to eight locations across northern Australia. This paper presents a baseline for classification performance on the dataset using the benchmark deep learning models, Inception-v3 and ResNet-50. These models achieved an average classification accuracy of 95.1% and 95.7%, respectively. We also demonstrate real time performance of the ResNet-50 architecture, with an average inference time of 53.4 ms per image. These strong results bode well for future field implementation of robotic weed control methods in the Australian rangelands.

17 Mar 2025

computer-science computer-vision-and-pattern-recognition robotics

OptiPMB: Enhancing 3D Multi-Object Tracking with Optimized Poisson Multi-Bernoulli Filtering

Beihang University Southeast University

HKUST

Shanghai Jiaotong University Nantong University City University of Macau James Cook University

Accurate 3D multi-object tracking (MOT) is crucial for autonomous driving, as it enables robust perception, navigation, and planning in complex environments. While deep learning-based solutions have demonstrated impressive 3D MOT performance, model-based approaches remain appealing for their simplicity, interpretability, and data efficiency. Conventional model-based trackers typically rely on random vector-based Bayesian filters within the tracking-by-detection (TBD) framework but face limitations due to heuristic data association and track management schemes. In contrast, random finite set (RFS)-based Bayesian filtering handles object birth, survival, and death in a theoretically sound manner, facilitating interpretability and parameter tuning. In this paper, we present OptiPMB, a novel RFS-based 3D MOT method that employs an optimized Poisson multi-Bernoulli (PMB) filter while incorporating several key innovative designs within the TBD framework. Specifically, we propose a measurement-driven hybrid adaptive birth model for improved track initialization, employ adaptive detection probability parameters to effectively maintain tracks for occluded objects, and optimize density pruning and track extraction modules to further enhance overall tracking performance. Extensive evaluations on nuScenes and KITTI datasets show that OptiPMB achieves superior tracking accuracy compared with state-of-the-art methods, thereby establishing a new benchmark for model-based 3D MOT and offering valuable insights for future research on RFS-based trackers in autonomous driving.

15 May 2025

computer-science robotics

Unsupervised Radar Point Cloud Enhancement via Arbitrary LiDAR Guided Diffusion Prior

Heidelberg University University Medical Center Göttingen Momoni AI James Cook University

In industrial automation, radar is a critical sensor in machine perception. However, the angular resolution of radar is inherently limited by the Rayleigh criterion, which depends on both the radar's operating wavelength and the effective aperture of its antenna array.To overcome these hardware-imposed limitations, recent neural network-based methods have leveraged high-resolution LiDAR data, paired with radar measurements, during training to enhance radar point cloud resolution. While effective, these approaches require extensive paired datasets, which are costly to acquire and prone to calibration error. These challenges motivate the need for methods that can improve radar resolution without relying on paired high-resolution ground-truth data. Here, we introduce an unsupervised radar points enhancement algorithm that employs an arbitrary LiDAR-guided diffusion model as a prior without the need for paired training data. Specifically, our approach formulates radar angle estimation recovery as an inverse problem and incorporates prior knowledge through a diffusion model with arbitrary LiDAR domain knowledge. Experimental results demonstrate that our method attains high fidelity and low noise performance compared to traditional regularization techniques. Additionally, compared to paired training methods, it not only achieves comparable performance but also offers improved generalization capability. To our knowledge, this is the first approach that enhances radar points output by integrating prior knowledge via a diffusion model rather than relying on paired training data. Our code is available at this https URL

14 Nov 2025

autonomous-vehicles computer-science computer-vision-and-pattern-recognition

MS-Occ: Multi-Stage LiDAR-Camera Fusion for 3D Semantic Occupancy Prediction

Tongji University

Nanyang Technological University Swinburne University of Technology University of Shanghai for Science and Technology James Cook University MomoniAI

Accurate 3D semantic occupancy perception is essential for autonomous driving in complex environments with diverse and irregular objects. While vision-centric methods suffer from geometric inaccuracies, LiDAR-based approaches often lack rich semantic information. To address these limitations, MS-Occ, a novel multi-stage LiDAR-camera fusion framework which includes middle-stage fusion and late-stage fusion, is proposed, integrating LiDAR's geometric fidelity with camera-based semantic richness via hierarchical cross-modal fusion. The framework introduces innovations at two critical stages: (1) In the middle-stage feature fusion, the Gaussian-Geo module leverages Gaussian kernel rendering on sparse LiDAR depth maps to enhance 2D image features with dense geometric priors, and the Semantic-Aware module enriches LiDAR voxels with semantic context via deformable cross-attention; (2) In the late-stage voxel fusion, the Adaptive Fusion (AF) module dynamically balances voxel features across modalities, while the High Classification Confidence Voxel Fusion (HCCVF) module resolves semantic inconsistencies using self-attention-based refinement. Experiments on two large-scale benchmarks demonstrate state-of-the-art performance. On nuScenes-OpenOccupancy, MS-Occ achieves an Intersection over Union (IoU) of 32.1% and a mean IoU (mIoU) of 25.3%, surpassing the state-of-the-art by +0.7% IoU and +2.4% mIoU. Furthermore, on the SemanticKITTI benchmark, our method achieves a new state-of-the-art mIoU of 24.08%, robustly validating its generalization this http URL studies further confirm the effectiveness of each individual module, highlighting substantial improvements in the perception of small objects and reinforcing the practical value of MS-Occ for safety-critical autonomous driving scenarios.

03 Oct 2023

autonomous-vehicles computer-science computer-vision-security

LXL: LiDAR Excluded Lean 3D Object Detection with 4D Imaging Radar and Camera Fusion

Beihang University

Chalmers University of Technology Swinburne University of Technology James Cook University Vitalent Consulting

As an emerging technology and a relatively affordable device, the 4D imaging radar has already been confirmed effective in performing 3D object detection in autonomous driving. Nevertheless, the sparsity and noisiness of 4D radar point clouds hinder further performance improvement, and in-depth studies about its fusion with other modalities are lacking. On the other hand, as a new image view transformation strategy, "sampling" has been applied in a few image-based detectors and shown to outperform the widely applied "depth-based splatting" proposed in Lift-Splat-Shoot (LSS), even without image depth prediction. However, the potential of "sampling" is not fully unleashed. This paper investigates the "sampling" view transformation strategy on the camera and 4D imaging radar fusion-based 3D object detection. LiDAR Excluded Lean (LXL) model, predicted image depth distribution maps and radar 3D occupancy grids are generated from image perspective view (PV) features and radar bird's eye view (BEV) features, respectively. They are sent to the core of LXL, called "radar occupancy-assisted depth-based sampling", to aid image view transformation. We demonstrated that more accurate view transformation can be performed by introducing image depths and radar information to enhance the "sampling" strategy. Experiments on VoD and TJ4DRadSet datasets show that the proposed method outperforms the state-of-the-art 3D object detection methods by a significant margin without bells and whistles. Ablation studies demonstrate that our method performs the best among different enhancement settings.

21 Apr 2025

computer-science robotics

Precision Robotic Spot-Spraying: Reducing Herbicide Use and Enhancing Environmental Outcomes in Sugarcane

James Cook University Sugar Research Australia AutoWeed Pty Ltd

Precise robotic weed control plays an essential role in precision agriculture. It can help significantly reduce the environmental impact of herbicides while reducing weed management costs for farmers. In this paper, we demonstrate that a custom-designed robotic spot spraying tool based on computer vision and deep learning can significantly reduce herbicide usage on sugarcane farms. We present results from field trials that compare robotic spot spraying against industry-standard broadcast spraying, by measuring the weed control efficacy, the reduction in herbicide usage, and the water quality improvements in irrigation runoff. The average results across 25 hectares of field trials show that spot spraying on sugarcane farms is 97\% as effective as broadcast spraying and reduces herbicide usage by 35\%, proportionally to the weed density. For specific trial strips with lower weed pressure, spot spraying reduced herbicide usage by up to 65\%. Water quality measurements of irrigation-induced runoff, three to six days after spraying, showed reductions in the mean concentration and mean load of herbicides of 39\% and 54\%, respectively, compared to broadcast spraying. These promising results reveal the capability of spot spraying technology to reduce herbicide usage on sugarcane farms without impacting weed control and potentially providing sustained water quality benefits.

21 May 2025

computer-science computer-vision-and-pattern-recognition image-segmentation

From Pixels to Images: Deep Learning Advances in Remote Sensing Image Semantic Segmentation

Wuhan University

University of Wisconsin-Madison La Trobe University James Cook University

This review traces the evolution of remote sensing image semantic segmentation (RSISS) through deep learning, systematically categorizing approaches by processing granularity and architectural innovations. It presents a comprehensive evaluation of 39 methods, demonstrating that tile-based multimodal approaches achieve high accuracy exceeding 95% by effectively integrating global context and complementary sensor information.

05 Aug 2020

electrical-engineering

Minimizing Electricity Cost through Smart Lighting Control for Indoor Plant Factories

Singapore University of Technology and Design James Cook University 3M APAC Pte Ltd

Smart plant factories incorporate sensing technology, actuators and control algorithms to automate processes, reducing the cost of production while improving crop yield many times over that of traditional farms. This paper investigates the growth of lettuce (Lactuca Sativa) in a smart farming setup when exposed to red and blue light-emitting diode (LED) horticulture lighting. An image segmentation method based on K-means clustering is used to identify the size of the plant at each stage of growth, and the growth of the plant modelled in a feed forward network. Finally, an optimization algorithm based on the plant growth model is proposed to find the optimal lighting schedule for growing lettuce with respect to dynamic electricity pricing. Genetic algorithm was utilized to find solutions to the optimization problem. When compared to a baseline in a simulation setting, the schedules proposed by the genetic algorithm can achieved between 40-52% savings in energy costs, and up to a 6% increase in leaf area.

27 Mar 2024

computer-science computer-vision-and-pattern-recognition machine-learning

Direct mineral content prediction from drill core images via transfer learning

University of Bern Paul Scherrer Institute James Cook University Nagra ETH Z urich

Deep subsurface exploration is important for mining, oil and gas industries, as well as in the assessment of geological units for the disposal of chemical or nuclear waste, or the viability of geothermal energy systems. Typically, detailed examinations of subsurface formations or units are performed on cuttings or core materials extracted during drilling campaigns, as well as on geophysical borehole data, which provide detailed information about the petrophysical properties of the rocks. Depending on the volume of rock samples and the analytical program, the laboratory analysis and diagnostics can be very time-consuming. This study investigates the potential of utilizing machine learning, specifically convolutional neural networks (CNN), to assess the lithology and mineral content solely from analysis of drill core images, aiming to support and expedite the subsurface geological exploration. The paper outlines a comprehensive methodology, encompassing data preprocessing, machine learning methods, and transfer learning techniques. The outcome reveals a remarkable 96.7% accuracy in the classification of drill core segments into distinct formation classes. Furthermore, a CNN model was trained for the evaluation of mineral content using a learning data set from multidimensional log analysis data (silicate, total clay, carbonate). When benchmarked against laboratory XRD measurements on samples from the cores, both the advanced multidimensional log analysis model and the neural network approach developed here provide equally good performance. This work demonstrates that deep learning and particularly transfer learning can support extracting petrophysical properties, including mineral content and formation classification, from drill core images, thus offering a road map for enhancing model performance and data set quality in image-based analysis of drill cores.

03 Dec 2023

ai-for-health computer-science artificial-intelligence

Ensemble Machine Learning Model Trained on a New Synthesized Dataset Generalizes Well for Stress Prediction Using Wearable Devices

James Cook University

Introduction. We investigate the generalization ability of models built on datasets containing a small number of subjects, recorded in single study protocols. Next, we propose and evaluate methods combining these datasets into a single, large dataset. Finally, we propose and evaluate the use of ensemble techniques by combining gradient boosting with an artificial neural network to measure predictive power on new, unseen data. Methods. Sensor biomarker data from six public datasets were utilized in this study. To test model generalization, we developed a gradient boosting model trained on one dataset (SWELL), and tested its predictive power on two datasets previously used in other studies (WESAD, NEURO). Next, we merged four small datasets, i.e. (SWELL, NEURO, WESAD, UBFC-Phys), to provide a combined total of 99 subjects,. In addition, we utilized random sampling combined with another dataset (EXAM) to build a larger training dataset consisting of 200 synthesized subjects,. Finally, we developed an ensemble model that combines our gradient boosting model with an artificial neural network, and tested it on two additional, unseen publicly available stress datasets (WESAD and Toadstool). Results. Our method delivers a robust stress measurement system capable of achieving 85% predictive accuracy on new, unseen validation data, achieving a 25% performance improvement over single models trained on small datasets. Conclusion. Models trained on small, single study protocol datasets do not generalize well for use on new, unseen data and lack statistical power. Ma-chine learning models trained on a dataset containing a larger number of varied study subjects capture physiological variance better, resulting in more robust stress detection.

10 Jun 2025

audio-and-speech-processing electrical-engineering

The Search for Squawk: Agile Modeling in Bioacoustics

Google DeepMind

University College London

Google Research Queensland University of Technology James Cook University University of Hawaii at Hilo Lancaster University UK Mars Sustainable Solutions IPB Indonesia

Passive acoustic monitoring (PAM) has shown great promise in helping ecologists understand the health of animal populations and ecosystems. However, extracting insights from millions of hours of audio recordings requires the development of specialized recognizers. This is typically a challenging task, necessitating large amounts of training data and machine learning expertise. In this work, we introduce a general, scalable and data-efficient system for developing recognizers for novel bioacoustic problems in under an hour. Our system consists of several key components that tackle problems in previous bioacoustic workflows: 1) highly generalizable acoustic embeddings pre-trained for birdsong classification minimize data hunger; 2) indexed audio search allows the efficient creation of classifier training datasets, and 3) precomputation of embeddings enables an efficient active learning loop, improving classifier quality iteratively with minimal wait time. Ecologists employed our system in three novel case studies: analyzing coral reef health through unidentified sounds; identifying juvenile Hawaiian bird calls to quantify breeding success and improve endangered species monitoring; and Christmas Island bird occupancy modeling. We augment the case studies with simulated experiments which explore the range of design decisions in a structured way and help establish best practices. Altogether these experiments showcase our system's scalability, efficiency, and generalizability, enabling scientists to quickly address new bioacoustic challenges.

27 Aug 2025

computer-science computer-vision-and-pattern-recognition domain-adaptation

Weed Detection in Challenging Field Conditions: A Semi-Supervised Framework for Overcoming Shadow Bias and Data Scarcity

James Cook University

The automated management of invasive weeds is critical for sustainable agriculture, yet the performance of deep learning models in real-world fields is often compromised by two factors: challenging environmental conditions and the high cost of data annotation. This study tackles both issues through a diagnostic-driven, semi-supervised framework. Using a unique dataset of approximately 975 labeled and 10,000 unlabeled images of Guinea Grass in sugarcane, we first establish strong supervised baselines for classification (ResNet) and detection (YOLO, RF-DETR), achieving F1 scores up to 0.90 and mAP50 scores exceeding 0.82. Crucially, this foundational analysis, aided by interpretability tools, uncovered a pervasive "shadow bias," where models learned to misidentify shadows as vegetation. This diagnostic insight motivated our primary contribution: a semi-supervised pipeline that leverages unlabeled data to enhance model robustness. By training models on a more diverse set of visual information through pseudo-labeling, this framework not only helps mitigate the shadow bias but also provides a tangible boost in recall, a critical metric for minimizing weed escapes in automated spraying systems. To validate our methodology, we demonstrate its effectiveness in a low-data regime on a public crop-weed benchmark. Our work provides a clear and field-tested framework for developing, diagnosing, and improving robust computer vision systems for the complex realities of precision agriculture.

16 Nov 2023

ai-for-health clustering-algorithms computer-science

Self-Supervised Multi-Modality Learning for Multi-Label Skin Lesion Classification

Shanghai Jiao Tong University

University of Sydney James Cook University

Dylan Wang

The clinical diagnosis of skin lesion involves the analysis of dermoscopic and clinical modalities. Dermoscopic images provide a detailed view of the surface structures whereas clinical images offer a complementary macroscopic information. The visual diagnosis of melanoma is also based on seven-point checklist which involves identifying different visual attributes. Recently, supervised learning approaches such as convolutional neural networks (CNNs) have shown great performances using both dermoscopic and clinical modalities (Multi-modality). The seven different visual attributes in the checklist are also used to further improve the the diagnosis. The performances of these approaches, however, are still reliant on the availability of large-scaled labeled data. The acquisition of annotated dataset is an expensive and time-consuming task, more so with annotating multi-attributes. To overcome this limitation, we propose a self-supervised learning (SSL) algorithm for multi-modality skin lesion classification. Our algorithm enables the multi-modality learning by maximizing the similarities between paired dermoscopic and clinical images from different views. In addition, we generate surrogate pseudo-multi-labels that represent seven attributes via clustering analysis. We also propose a label-relation-aware module to refine each pseudo-label embedding and capture the interrelationships between pseudo-multi-labels. We validated the effectiveness of our algorithm using well-benchmarked seven-point skin lesion dataset. Our results show that our algorithm achieved better performances than other state-of-the-art SSL counterparts.

30 Mar 2024

computer-science computer-vision-and-pattern-recognition

A Review of Predictive and Contrastive Self-supervised Learning for Medical Images

The University of Sydney James Cook University

Over the last decade, supervised deep learning on manually annotated big data has been progressing significantly on computer vision tasks. But the application of deep learning in medical image analysis was limited by the scarcity of high-quality annotated medical imaging data. An emerging solution is self-supervised learning (SSL), among which contrastive SSL is the most successful approach to rivalling or outperforming supervised learning. This review investigates several state-of-the-art contrastive SSL algorithms originally on natural images as well as their adaptations for medical images, and concludes by discussing recent advances, current limitations, and future directions in applying contrastive SSL in the medical domain.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Demystifying Deep Learning-based Brain Tumor Segmentation with 3D UNets and Explainable AI (XAI): A Comparative Analysis

Computationally Efficient Diffusion Models in Medical Imaging: A Comprehensive Review

Affinity LCFCN: Learning to Segment Fish with Weak Supervision

RadarGaussianDet3D: An Efficient and Effective Gaussian-based 3D Detector with 4D Automotive Radars

Vehicle-to-Everything Cooperative Perception for Autonomous Driving

AerialMind: Towards Referring Multi-Object Tracking in UAV Scenarios

DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning

OptiPMB: Enhancing 3D Multi-Object Tracking with Optimized Poisson Multi-Bernoulli Filtering

Unsupervised Radar Point Cloud Enhancement via Arbitrary LiDAR Guided Diffusion Prior

MS-Occ: Multi-Stage LiDAR-Camera Fusion for 3D Semantic Occupancy Prediction

LXL: LiDAR Excluded Lean 3D Object Detection with 4D Imaging Radar and Camera Fusion

Precision Robotic Spot-Spraying: Reducing Herbicide Use and Enhancing Environmental Outcomes in Sugarcane

From Pixels to Images: Deep Learning Advances in Remote Sensing Image Semantic Segmentation

Minimizing Electricity Cost through Smart Lighting Control for Indoor Plant Factories

Direct mineral content prediction from drill core images via transfer learning

Ensemble Machine Learning Model Trained on a New Synthesized Dataset Generalizes Well for Stress Prediction Using Wearable Devices

The Search for Squawk: Agile Modeling in Bioacoustics

Weed Detection in Challenging Field Conditions: A Semi-Supervised Framework for Overcoming Shadow Bias and Data Scarcity

Self-Supervised Multi-Modality Learning for Multi-Label Skin Lesion Classification

A Review of Predictive and Contrastive Self-supervised Learning for Medical Images

Events

AI for Law

Personalize Your Feed