In the context of neural networks, self-supervised learning aims to leverage inherent structures or relationships within the input data to create meaningful training signals.
SSL tasks are designed so that solving them requires capturing essential features or relationships in the data.
This augmentation can involve introducing noise, cropping, rotation, or other transformations.
[4][5][6] Self-supervised learning has produced promising results in recent years, and has found practical application in fields such as audio processing, and is being used by Facebook and others for speech recognition.
[8] In other words, the model is tasked with learning a representation of the data that captures its essential features or structure, allowing it to regenerate the original input.
The term "autoassociative" comes from the fact that the model is essentially associating the input data with itself.
This is often achieved using autoencoders, which are a type of neural network architecture used for representation learning.
Autoencoders consist of an encoder network that maps the input data to a lower-dimensional representation (latent space), and a decoder network that reconstructs the input from this representation.
The loss function used during training typically penalizes the difference between the original input and the reconstructed output (e.g. mean squared error).
By minimizing this reconstruction error, the autoencoder learns a meaningful representation of the data in its latent space.
Counterintuitively, NCSSL converges on a useful local minimum rather than reaching a trivial solution, with zero loss.
[9] SSL belongs to supervised learning methods insofar as the goal is to generate a classified output from the input.
[1] SSL is similar to unsupervised learning in that it does not require labels in the sample data.
However, in current jargon, the term 'self-supervised' often refers to tasks based on a pretext-task training setup.
This involves the (human) design of such pretext task(s), unlike the case of fully self-contained autoencoder training.
For example, Facebook developed wav2vec, a self-supervised algorithm, to perform speech recognition using two deep convolutional neural networks that build on each other.
[7] Google's Bidirectional Encoder Representations from Transformers (BERT) model is used to better understand the context of search queries.
[16] Bootstrap Your Own Latent (BYOL) is a NCSSL that produced excellent results on ImageNet and on transfer and semi-supervised benchmarks.
[17] The Yarowsky algorithm is an example of self-supervised learning in natural language processing.
DirectPred is a NCSSL that directly sets the predictor weights instead of learning it via typical gradient descent.
[18] Self-supervised learning continues to gain prominence as a new approach across diverse fields.
Its ability to leverage unlabeled data effectively opens new possibilities for advancement in machine learning, especially in data-driven application domains.