Inception[1] is a family of convolutional neural network (CNN) for computer vision, introduced by researchers at Google in 2014 as GoogLeNet (later renamed Inception v1).
The series was historically important as an early CNN that separates the stem (data ingest), body (data processing), and head (prediction), an architectural design that persists in all modern CNN.
[2] In 2014, a team at Google developed the GoogLeNet architecture, an instance of which won the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
The models and the code were released under Apache 2.0 license on GitHub.
[4]The Inception v1 architecture is a deep CNN composed of 22 layers.
[6] Since Inception v1 is deep, it suffered from the vanishing gradient problem.
Inception v2 was released in 2015, in a paper that is more famous for proposing batch normalization.
As an example, a single 5×5 convolution can be factored into 3×3 stacked on top of another 3×3.
The 5×5 convolution kernel has 25 parameters, compared to just 18 in the factorized version.
Thus, the 5×5 convolution is strictly more powerful than the factorized version.
Empirically, the research team found that factorized convolutions help.
Other than this, it also removed the lowest auxiliary classifier during training.
They found that the auxiliary head worked as a form of regularization.
, they made the model predict the smoothed distribution
[10] Inception v4 is an incremental update with even more factorized convolutions, and other complications that were empirically found to improve benchmarks.
[12] It is a linear stack of depthwise separable convolution layers with residual connections.