Kernel ridge regression, KRR, is a generalization of linear ridge regression
that is non-linear in the data, but linear in the model parameters. Here, we
introduce an equivalent formulation of the objective function of KRR, which
opens up for replacing the ridge penalty with the
ℓ∞ and
ℓ1
penalties. Using the
ℓ∞ and
ℓ1 penalties, we obtain robust and
sparse kernel regression, respectively. We study the similarities between
explicitly regularized kernel regression and the solutions obtained by early
stopping of iterative gradient-based methods, where we connect
ℓ∞
regularization to sign gradient descent,
ℓ1 regularization to forward
stagewise regression (also known as coordinate descent), and
ℓ2
regularization to gradient descent, and, in the last case, theoretically bound
for the differences. We exploit the close relations between
ℓ∞
regularization and sign gradient descent, and between
ℓ1 regularization
and coordinate descent to propose computationally efficient methods for robust
and sparse kernel regression. We finally compare robust kernel regression
through sign gradient descent to existing methods for robust kernel regression
on five real data sets, demonstrating that our method is one to two orders of
magnitude faster, without compromised accuracy.