In here, we introduce a novel approach to enhance the accuracy and efficiency
of COVID-19 diagnosis using CT images. Leveraging state-of-the-art Transformer
models in computer vision, we employed the base ViT Transformer configured for
224x224-sized input images, modifying the output to suit the binary
classification task. Notably, input images were resized from the standard CT
scan size of 512x512 to match the model's expectations. Our method implements a
systematic patient-level prediction strategy, classifying individual CT slices
as COVID-19 or non-COVID. To determine the overall diagnosis for each patient,
a majority voting approach as well as other thresholding approaches were
employed. This method involves evaluating all CT slices for a given patient and
assigning the patient the diagnosis that relates to the thresholding for the CT
scan. This meticulous patient-level prediction process contributes to the
robustness of our solution as it starts from 2D-slices to 3D-patient level.
Throughout the evaluation process, our approach resulted in 0.7 macro F1 score
on the COV19-CT -DB validation set. To ensure the reliability and effectiveness
of our model, we rigorously validate it on the extensive COV-19 CT dataset,
which is meticulously annotated for the task. This dataset, with its
comprehensive annotations, reinforces the overall robustness of our solution.