We study
⊥Grad, a geometry-aware modification to gradient-based optimization that constrains descent directions to address overconfidence, a key limitation of standard optimizers in uncertainty-critical applications. By enforcing orthogonality between gradient updates and weight vectors,
⊥Grad alters optimization trajectories without architectural changes. On CIFAR-10 with 10% labeled data,
⊥Grad matches SGD in accuracy while achieving statistically significant improvements in test loss (
p=0.05), predictive entropy (
p=0.001), and confidence measures. These effects show consistent trends across corruption levels and architectures.
⊥Grad is optimizer-agnostic, incurs minimal overhead, and remains compatible with post-hoc calibration techniques.
Theoretically, we characterize convergence and stationary points for a simplified
⊥Grad variant, revealing that orthogonalization constrains loss reduction pathways to avoid confidence inflation and encourage decision-boundary improvements. Our findings suggest that geometric interventions in optimization can improve predictive uncertainty estimates at low computational cost.