In response to the threat of adversarial examples, adversarial training
provides an attractive option for enhancing the model robustness by training
models on online-augmented adversarial examples. However, most of the existing
adversarial training methods focus on improving the robust accuracy by
strengthening the adversarial examples but neglecting the increasing shift
between natural data and adversarial examples, leading to a dramatic decrease
in natural accuracy. To maintain the trade-off between natural and robust
accuracy, we alleviate the shift from the perspective of feature adaption and
propose a Feature Adaptive Adversarial Training (FAAT) optimizing the
class-conditional feature adaption across natural data and adversarial
examples. Specifically, we propose to incorporate a class-conditional
discriminator to encourage the features become (1) class-discriminative and (2)
invariant to the change of adversarial attacks. The novel FAAT framework
enables the trade-off between natural and robust accuracy by generating
features with similar distribution across natural and adversarial data, and
achieve higher overall robustness benefited from the class-discriminative
feature characteristics. Experiments on various datasets demonstrate that FAAT
produces more discriminative features and performs favorably against
state-of-the-art methods. Codes are available at
this https URL