Advanced Institute of Industrial Technology
Serendipity in recommender systems (RSs) has attracted increasing attention as a concept that enhances user satisfaction by presenting unexpected and useful items. However, evaluating serendipitous performance remains challenging because its ground truth is generally unobservable. The existing offline metrics often depend on ambiguous definitions or are tailored to specific datasets and RSs, thereby limiting their generalizability. To address this issue, we propose a universally applicable evaluation framework that leverages large language models (LLMs) known for their extensive knowledge and reasoning capabilities, as evaluators. First, to improve the evaluation performance of the proposed framework, we assessed the serendipity prediction accuracy of LLMs using four different prompt strategies on a dataset containing user-annotated serendipitous ground truth and found that the chain-of-thought prompt achieved the highest accuracy. Next, we re-evaluated the serendipitous performance of both serendipity-oriented and general RSs using the proposed framework on three commonly used real-world datasets, without the ground truth. The results indicated that there was no serendipity-oriented RS that consistently outperformed across all datasets, and even a general RS sometimes achieved higher performance than the serendipity-oriented RS.
In information recommendation, a session refers to a sequence of user actions within a specific time frame. Session-based recommender systems aim to capture short-term preferences and generate relevant recommendations. However, user interests may shift even within a session, making appropriate segmentation essential for modeling dynamic behaviors. In this study, we propose a supervised session segmentation method based on similarity features derived from action embeddings and attributes. We compute the similarity scores between items within a fixed-size window around each candidate segmentation point, using item co-occurrence embeddings, text embeddings of titles and brands, and price information as sources for these similarity features. These features are used to train supervised classification models to predict the session boundaries. We construct a manually annotated dataset from real browsing histories and evaluate the segmentation performance using F1-score, PR-AUC, and ROC-AUC. The LightGBM model achieves the best performance, with an F1-score of 0.806 and a PR-AUC of 0.831. These results demonstrate the effectiveness of the proposed method for session segmentation and its potential to capture dynamic user behaviors.
There are no more papers matching your filters at the moment.