Attention Factors for Statistical Arbitrage

BibTex

Copy

@misc{epsteinMon Oct 13 2025 16:56:30 GMT+0000 (Coordinated Universal Time)attentionfactorsstatistical,
      title={Attention Factors for Statistical Arbitrage},
      author={Elliot L. Epstein and Rose Wang and Jaewon Choi and Markus Pelger},
      year={Mon Oct 13 2025 16:56:30 GMT+0000 (Coordinated Universal Time)},
      eprint={2510.11616},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2510.11616},
}

AI Audio Lecture + Q&A

0:00 / 0:00

Attention Factors for Statistical Arbitrage

Transcript

John: Welcome to our seminar on Advanced Methods in Quantitative Finance. Today's lecture is on 'Attention Factors for Statistical Arbitrage' by a team from Stanford University. We've seen a lot of work on end-to-end models lately, like 'Deep Learning for Options Trading,' which directly optimizes a policy. This paper sits in that vein, but it applies the idea to factor construction. It's a departure from just finding alpha factors and then building a portfolio. Instead, it aims to build the factors and the trading strategy simultaneously with a single objective. John: Yes, Noah? Noah: Hi Professor. So, are they proposing that traditional factor models like PCA are flawed, or just that they're being used for the wrong objective when it comes to statistical arbitrage? John: The latter, precisely. The core argument is that factors optimized to explain cross-sectional variance are not necessarily the best factors for building a profitable, low-turnover trading strategy. The objectives are misaligned. John: Let's unpack the main concept. The standard approach to statistical arbitrage is a two-step process. First, you use a statistical tool like PCA to identify factors that capture common movements among assets. The parts of the returns not explained by these factors are the residuals, which you assume are temporary mispricings. In the second step, you model these residuals, often assuming they mean-revert, and build a trading strategy to profit from that reversion. Noah: And the problem is that the first step, finding the factors, is completely disconnected from the second step, trading the residuals? John: Exactly. You might find factors that explain a lot of variance but result in residuals that are very noisy or lead to high-turnover strategies. Once you account for transaction and shorting costs, a theoretically profitable strategy can quickly become unprofitable. This paper's central contribution is to collapse this into a 'one-step' solution. They design a single deep learning model that learns the factors and the trading policy at the same time. Noah: So what does the architecture look like? How do they jointly optimize both? John: The model has two main components. The first is an attention mechanism that constructs the factors. It learns to form factor-replicating portfolios by paying attention to different firm characteristics. The second part is a sequence model, specifically a LongConv network, that analyzes the time-series of the resulting residuals and decides on the final trading portfolio weights. The entire system is trained end-to-end to maximize a single objective: the Sharpe ratio of the final portfolio after deducting realistic trading costs. John: Let's look at the application of that attention mechanism more closely, as it's the core technical novelty. Traditional PCA finds orthogonal factors that capture the most variance. Here, the model learns what they call 'Attention Factors.' It uses learnable query vectors, which you can think of as abstract representations of factor ideas. The model then uses an attention mechanism to compare each asset's characteristics to these queries. The output defines how each asset contributes to forming the tradable factor portfolios. Noah: So instead of being purely statistical constructs, these factors are purpose-built portfolios designed from the start to be useful for the downstream trading task? John: Correct. And because the final training objective is the net Sharpe ratio, the model is incentivized to create factors that lead to stable, low-turnover trading signals. It learns to balance explaining returns with the practical need to minimize costs. This is the key insight. The resulting strategy achieves an annualized net Sharpe ratio of 2.28, which is a very strong result in this literature. For comparison, a state-of-the-art benchmark using PCA factors but the same advanced trading model only achieved a net Sharpe of 1.57. That difference comes almost entirely from the cost-aware, end-to-end factor construction. Noah: A quick question on the objective function. If it's just maximizing net Sharpe, what stops the model from finding very strange factors that don't actually explain much systematic risk? John: That's an important detail. The authors add a secondary term to the objective function that penalizes unexplained variance. A weighting parameter balances the two goals. Empirically, they found that including this term helped with model stability and performance, ensuring the factors remain economically meaningful while still being optimized for tradability. John: This work has significant implications for how we approach quantitative strategy design. It suggests a paradigm shift away from optimizing for intermediate statistical metrics, like explained variance, and toward optimizing directly for the ultimate goal, which is usually some form of risk-adjusted profit after costs. It also challenges the conventional wisdom of seeking only a few dominant factors. The paper shows that performance improves with up to 100 factors, suggesting that many 'weak factors' capturing localized, granular mispricings are highly valuable for arbitrage. Noah: That idea of weak factors reminds me of network-based models, like in 'Network Momentum across Asset Classes,' where signals are more diffuse. It seems this attention model is good at picking up those subtle, higher-order relationships that PCA would miss. John: That's a very good connection. When they analyzed the learned factors, they found the model implicitly grouped firms by industry, but the most important characteristic for performance was past returns, not fundamentals. This indicates the model is exploiting complex, transient price patterns rather than just sector effects. John: So to wrap up, the key takeaway is the power of unifying the objectives of factor construction and trading policy optimization. By creating a single, end-to-end framework that is explicitly aware of real-world frictions like transaction costs, the authors were able to develop a statistical arbitrage strategy with a new level of performance and practical viability. It demonstrates that how you frame the problem and define the objective is just as critical as the sophistication of the model itself. John: Thanks for listening. If you have any further questions, ask our AI assistant or drop a comment.

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Dark mode

Attention Factors for Statistical Arbitrage