Labels are generated from the data itself, without human intervention. The classic task is next-token-prediction, where the label is inferred automatically from the text. Next-token-prediction is how most LLMs and deep generative models are trained (and according to some will lead to the end of the world. Gasp!). A similar task is masked-token-prediction, where the model predictions the missing word (token) in a sentence, not the next word (token). This is how BERT is trained.

I dislike this category (he yells into the void). “Self”-supervised makes it seem like the algorithm is deciding on what the labels should be. This isn’t true. What constitutes the label is given by a priori chosen formal rules (choosing the token space and deciding that given a sentence, a model should predict the next token). Self-supervised learning is really best thought of as supervised learning in which the labels are automatically generated by a simple script. In particular, the label space, and what the model should be learning, is still chosen by humans.

Self-supervised learning is distinct from semi-supervised learning.