Double descent

[2] This phenomenon has been considered surprising, as it contradicts assumptions about overfitting in classical machine learning.

[3] Early observations of what would later be called double descent in specific models date back to 1989.

al.[6] in 2019,[3] when the phenomenon gained popularity as a broader concept exhibited by many models.

[7][8] The latter development was prompted by a perceived contradiction between the conventional wisdom that too many parameters in the model result in a significant overfitting error (an extrapolation of the bias–variance tradeoff),[9] and the empirical observations in the 2010s that some modern machine learning techniques tend to perform better with larger models.

[11] A model of double descent at the thermodynamic limit has been analyzed using the replica trick, and the result has been confirmed numerically.

An example of the double descent phenomenon in a two-layer neural network : as the ratio of parameters to data points increases, the test error first falls, then rises, then falls again. [ 1 ] The vertical line marks the "interpolation threshold" boundary between the underparametrized region (more data points than parameters) and the overparameterized region (more parameters than data points).