We study reinforcement learning (RL) with transition look-ahead, where the agent may observe which states would be visited upon playing any sequence of
ℓ actions before deciding its course of action. While such predictive information can drastically improve the achievable performance, we show that using this information optimally comes at a potentially prohibitive computational cost. Specifically, we prove that optimal planning with one-step look-ahead (
ℓ=1) can be solved in polynomial time through a novel linear programming formulation. In contrast, for
ℓ≥2, the problem becomes NP-hard. Our results delineate a precise boundary between tractable and intractable cases for the problem of planning with transition look-ahead in reinforcement learning.