It occurs when the model produces outputs that are less diverse than expected, effectively "collapsing" to generate only a few modes of the data distribution while ignoring others.
This phenomenon undermines the goal of generative models to capture the full diversity of the training data.
[1][2] Common causes include:[3] Several GAN-specific strategies were developed to mitigate mode collapse: The large language models are usually trained in two steps.
In the first step ("pretraining"), the model is trained to simply generate text sampled from a large dataset.
More finetuning would result in higher average task performance, but less diverse outputs.