This work simultaneously considers the discriminability and transferability
properties of deep representations in the typical supervised learning task,
i.e., image classification. By a comprehensive temporal analysis, we observe a
trade-off between these two properties. The discriminability keeps increasing
with the training progressing while the transferability intensely diminishes in
the later training period.
From the perspective of information-bottleneck theory, we reveal that the
incompatibility between discriminability and transferability is attributed to
the over-compression of input information. More importantly, we investigate why
and how the InfoNCE loss can alleviate the over-compression, and further
present a learning framework, named contrastive temporal coding~(CTC), to
counteract the over-compression and alleviate the incompatibility. Extensive
experiments validate that CTC successfully mitigates the incompatibility,
yielding discriminative and transferable representations. Noticeable
improvements are achieved on the image classification task and challenging
transfer learning tasks. We hope that this work will raise the significance of
the transferability property in the conventional supervised learning setting.
Code is available at this https URL