Accurate estimation of question difficulty and prediction of student
performance play key roles in optimizing educational instruction and enhancing
learning outcomes within digital learning platforms. The Elo rating system is
widely recognized for its proficiency in predicting student performance by
estimating both question difficulty and student ability while providing
computational efficiency and real-time adaptivity. This paper presents an
adaptation of a multi concept variant of the Elo rating system to the data
collected by a medical training platform, a platform characterized by a vast
knowledge corpus, substantial inter-concept overlap, a huge question bank with
significant sparsity in user question interactions, and a highly diverse user
population, presenting unique challenges. Our study is driven by two primary
objectives: firstly, to comprehensively evaluate the Elo rating systems
capabilities on this real-life data, and secondly, to tackle the issue of
imprecise early stage estimations when implementing the Elo rating system for
online assessments. Our findings suggest that the Elo rating system exhibits
comparable accuracy to the well-established logistic regression model in
predicting final exam outcomes for users within our digital platform.
Furthermore, results underscore that initializing Elo rating estimates with
historical data remarkably reduces errors and enhances prediction accuracy,
especially during the initial phases of student interactions.