Automated detection of surgical errors can improve robotic-assisted surgery.
Despite promising progress, existing methods still face challenges in capturing
rich temporal context to establish long-term dependencies while maintaining
computational efficiency. In this paper, we propose a novel hierarchical model
named SEDMamba, which incorporates the selective state space model (SSM) into
surgical error detection, facilitating efficient long sequence modelling with
linear complexity. SEDMamba enhances selective SSM with a bottleneck mechanism
and fine-to-coarse temporal fusion (FCTF) to detect and temporally localize
surgical errors in long videos. The bottleneck mechanism compresses and
restores features within their spatial dimension, thereby reducing
computational complexity. FCTF utilizes multiple dilated 1D convolutional
layers to merge temporal information across diverse scale ranges, accommodating
errors of varying duration. Our work also contributes the first-of-its-kind,
frame-level, in-vivo surgical error dataset to support error detection in real
surgical cases. Specifically, we deploy the clinically validated observational
clinical human reliability assessment tool (OCHRA) to annotate the errors
during suturing tasks in an open-source radical prostatectomy dataset
(SAR-RARP50). Experimental results demonstrate that our SEDMamba outperforms
state-of-the-art methods with at least 1.82% AUC and 3.80% AP performance gains
with significantly reduced computational complexity. The corresponding error
annotations, code and models are released at
this https URL