This study systematically performed an extensive real-world evaluation of the
performances of all configurations of YOLOv8, YOLOv9, YOLOv10, YOLO11( or
YOLOv11), and YOLOv12 object detection algorithms in terms of precision,
recall, mean Average Precision at 50\% Intersection over Union (mAP@50), and
computational speeds including pre-processing, inference, and post-processing
times immature green apple (or fruitlet) detection in commercial orchards.
Additionally, this research performed and validated in-field counting of the
fruitlets using an iPhone and machine vision sensors. Among the configurations,
YOLOv12l recorded the highest recall rate at 0.90, compared to all other
configurations of YOLO models. Likewise, YOLOv10x achieved the highest
precision score of 0.908, while YOLOv9 Gelan-c attained a precision of 0.903.
Analysis of mAP@0.50 revealed that YOLOv9 Gelan-base and YOLOv9 Gelan-e reached
peak scores of 0.935, with YOLO11s and YOLOv12l following closely at 0.933 and
0.931, respectively. For counting validation using images captured with an
iPhone 14 Pro, the YOLO11n configuration demonstrated outstanding accuracy,
recording RMSE values of 4.51 for Honeycrisp, 4.59 for Cosmic Crisp, 4.83 for
Scilate, and 4.96 for Scifresh; corresponding MAE values were 4.07, 3.98, 7.73,
and 3.85. Similar performance trends were observed with RGB-D sensor data.
Moreover, sensor-specific training on Intel Realsense data significantly
enhanced model performance. YOLOv11n achieved highest inference speed of 2.4
ms, outperforming YOLOv8n (4.1 ms), YOLOv9 Gelan-s (11.5 ms), YOLOv10n (5.5
ms), and YOLOv12n (4.6 ms), underscoring its suitability for real-time object
detection applications. (YOLOv12 architecture, YOLOv11 Architecture, YOLOv12
object detection, YOLOv11 object detecion, YOLOv12 segmentation)