alphaXiv

urich University of Applied Sciences

03 Jun 2025

attention-mechanisms computer-science machine-learning

The underlying structures of self-attention: symmetry, directionality, and emergent dynamics in Transformer training

ETH Zürich urich ETH AI Center University of Z urich University of Applied Sciences ECLT European Centre for Living Technology Centre for Artificial Intelligence, Z

This research develops a mathematical framework to analyze the underlying structures of self-attention's query-key (W_qk) matrix, demonstrating that Transformer training objectives (bidirectional vs. autoregressive) intrinsically shape W_qk into symmetric or directional forms, respectively. The empirical validation across diverse models and modalities confirms these emergent structures, and symmetric initialization for encoder-only language models reduced training convergence time by up to 73%.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

The underlying structures of self-attention: symmetry, directionality, and emergent dynamics in Transformer training

Events

AI for Law

Personalize Your Feed