The SYMBOLICAI framework integrates large language models as semantic parsers with various solvers to facilitate complex, multi-step neuro-symbolic AI workflows. It introduces the VERTEX score for evaluating these multi-step generative processes, showing GPT-4 Turbo achieves the highest overall performance (0.68 VERTEX score) but all evaluated models exhibit unreliability in sophisticated logical reasoning and hierarchical graph orchestration.
View blogA new metric, Fréchet Video Distance (FVD), and a suite of challenging benchmark datasets, StarCraft 2 Videos (SCV), are introduced to provide more accurate evaluation tools for deep generative video models. FVD demonstrates strong correlation with human perception of video quality, while current models show significant limitations on SCV tasks requiring relational reasoning and long-term consistency.
View blogResearchers at Johannes Kepler University introduced TACOS, a dataset of 47,748 temporally-aligned audio captions, to enable audio-language models to understand the precise timing and temporal relationships of sound events. Their frame-wise contrastive learning approach, leveraging this data, improved text-based sound event detection by 5.5 percentage points in PSDS1 score compared to models trained with clip-level captions.
View blog