Compiling shallow and accurate quantum circuits for Hamiltonian simulation remains challenging due to hardware constraints and the combinatorial complexity of minimizing gate count and circuit depth. Existing optimization method pipelines rely on hand-engineered classical heuristics, which cannot learn input-dependent structure and therefore miss substantial opportunities for circuit reduction.
We introduce \textbf{F2}, an offline reinforcement learning framework that exploits free-fermionic structure to efficiently compile Trotter-based Hamiltonian simulation circuits. F2 provides (i) a reinforcement-learning environment over classically simulatable free-fermionic subroutines, (ii) architectural and objective-level inductive biases that stabilize long-horizon value learning, and (iii) a reversible synthetic-trajectory generation mechanism that consistently yields abundant, guaranteed-successful offline data.
Across benchmarks spanning lattice models, protein fragments, and crystalline materials (12-222 qubits), F2 reduces gate count by 47\% and depth by 38\% on average relative to strong baselines (Qiskit, Cirq/OpenFermion) while maintaining average errors of
10−7. These results show that aligning deep reinforcement learning with the algebraic structure of quantum dynamics enables substantial improvements in circuit synthesis, suggesting a promising direction for scalable, learning-based quantum compilation