A lightweight, multi-modal deep learning system was developed to automatically detect sports highlights, demonstrating high accuracy (up to 89.24% for audio, 83.00% for video) on relatively small datasets by combining audio Mel-spectrograms and grayscale video frames. The ensemble model effectively leverages the complementary nature of audio and visual cues to enhance detection robustness and reduce errors seen in single-modality approaches.
View blogThe research demonstrates how the practical demands of Renaissance perspective painting implicitly drove the discovery of projective geometry, formalizing how artists' techniques, such as vanishing points and the horizon line, are rooted in fundamental mathematical principles. Through analytical reconstruction of paintings like Piero della Francesca's Flagellation, the authors illustrate how to deduce the observer's original viewpoint, offering new tools for art-historical interpretation.
View blog