You Only Look Once

You Only Look Once (YOLO) is a series of real-time object detection systems based on convolutional neural networks.

First introduced by Joseph Redmon et al. in 2015,[1] YOLO has undergone several iterations and improvements, becoming one of the most popular object detection frameworks.

[2] The name "You Only Look Once" refers to the fact that the algorithm requires only one forward propagation pass through the neural network to make predictions, unlike previous region proposal-based techniques like R-CNN that require thousands for a single image.

Compared to previous methods like R-CNN and OverFeat,[3] instead of applying the model to an image at multiple locations and scales, YOLO applies a single neural network to the full image.

These bounding boxes are weighted by the predicted probabilities.

OverFeat was an early influential model for simultaneous object classification and localization.

[3][4] Its architecture is as follows: There are two parts to the YOLO series.

The original part contained YOLOv1, v2, and v3, all released on a website maintained by Joseph Redmon.

[5] The original YOLO algorithm, introduced in 2015,[1] divides the image into an

If the center of an object's bounding box falls into a grid cell, that cell is said to "contain" that object.

These confidence scores reflect how confident the model is that the box contains an object and how accurate it thinks the box is that it predicts.

In more detail, the network performs the same convolutional operation over each of the

The output of the network on each patch is a tuple as follows:

During training, for each cell, if it contains a ground truth bounding box, then only the predicted bounding boxes with the highest IoU with the ground truth bounding boxes is used for gradient descent.

are trained by gradient descent to approach the ground truth,

If a cell contains no ground truth, then only

Released in 2016, YOLOv2 (also known as YOLO9000)[6][7] improved upon the original model by incorporating batch normalization, a higher resolution classifier, and using anchor boxes to predict bounding boxes.

It was also released on GitHub under the Apache 2.0 license.

[8] YOLOv3, introduced in 2018, contained only "incremental" improvements, including the use of a more complex backbone network, multiple scales for detection, and a more sophisticated loss function.

)[10][11][12][13] have been developed by different researchers, further improving performance and introducing new features.

These versions are not officially associated with the original YOLO authors but build upon their work.

Objects detected with OpenCV's Deep Neural Network module by using a YOLOv3 model trained on COCO dataset capable to detect objects of 80 common classes