VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

Adrien Bardes, Jean Ponce, Yann LeCun

Abstract

Recent self-supervised methods for image representation learning maximize the agreement between embedding vectors produced by encoders fed with different views of the same image. The main challenge is to prevent a collapse in which the encoders produce constant or non-informative vectors. We introduce VICReg (Variance-Invariance-Covariance Regularization), a method that explicitly avoids the collapse problem with two regularizations terms applied to both embeddings separately: (1) a term that maintains the variance of each embedding dimension above a threshold, (2) a term that decorrelates each pair of variables. Unlike most other approaches to the same problem, VICReg does not require techniques such as: weight sharing between the branches, batch normalization, feature-wise normalization, output quantization, stop gradient, memory banks, etc., and achieves results on par with the state of the art on several downstream tasks. In addition, we show that our variance regularization term stabilizes the training of other methods and leads to performance improvements.

self-supervised learning method for training embeddings from images (though could probably be applied to other modalities) where the model generates outputs from two transformed version of the input and tries to ensure that the output embeddings are similar (invariance), the individual dimensions of the embeddings vary enough within a batch (variance), and the covariances among the dimensions are minimized (covariance).

Loss function is weighted average of the three terms, with the covariance term being constrained to the smallest weight (which makes sense given it’s units are likely large) and the other weights forced to be equal (variance and invariance “matter” equally).

Benefits of the approach are that you don’t need contrastive examples or normalization.

The most important features of the approach are variance and invariance, without which the embeddings simply collapse. The covariance feature enhances performance to near-SOTA levels but isn’t required to get the approach at least working to some extent.


References

paperreadonline