The evaluation of image generators remains a challenge due to the limitations
of traditional metrics in providing nuanced insights into specific image
regions. This is a critical problem as not all regions of an image may be
learned with similar ease. In this work, we propose a novel approach to
disentangle the cosine similarity of mean embeddings into the product of cosine
similarities for individual pixel clusters via central kernel alignment.
Consequently, we can quantify the contribution of the cluster-wise performance
to the overall image generation performance. We demonstrate how this enhances
the explainability and the likelihood of identifying pixel regions of model
misbehavior across various real-world use cases.