Researchers at Sapienza University of Rome characterized the latent space dynamics of recurrent-depth transformers, revealing distinct small-scale refinements within looped blocks and larger-scale drifts across blocks. Leveraging this geometric understanding, they introduced an acceleration-based early-exit mechanism that reduced inference latency by up to 38% while preserving output quality, outperforming existing methods.
View blog