UTD
Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding
15 Sep 2025

The Dr.V framework introduces a comprehensive hierarchical taxonomy and a benchmark with fine-grained spatial-temporal grounding to diagnose video hallucinations in large video models (LVMs). Its Dr.V-Agent, a training-free diagnostic system, effectively identifies and mitigates LVM errors, showing significant performance improvements, for instance, increasing VideoChat2's accuracy by 18.60% against human-level performance of 95.25%.

View blog
Resources2
There are no more papers matching your filters at the moment.