Computer vision tasks include methods for acquiring, processing, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the form of decisions.
[1][2][3][4] "Understanding" in this context signifies the transformation of visual images (the input to the retina) into descriptions of the world that make sense to thought processes and can elicit appropriate action.
The scientific discipline of computer vision is concerned with the theory behind artificial systems that extract information from images.
Subdisciplines of computer vision include scene reconstruction, object detection, event detection, activity recognition, video tracking, object recognition, 3D pose estimation, learning, indexing, motion estimation, visual servoing, 3D scene modeling, and image restoration.
"[8] As a scientific discipline, computer vision is concerned with the theory behind artificial systems that extract information from images.
It was meant to mimic the human visual system as a stepping stone to endowing robots with intelligent behavior.
Studies in the 1970s formed the early foundations for many of the computer vision algorithms that exist today, including extraction of edges from images, labeling of lines, non-polyhedral and polyhedral modeling, representation of objects as interconnections of smaller structures, optical flow, and motion estimation.
These include the concept of scale-space, the inference of shape from various cues such as shading, texture and focus, and contour models known as snakes.
Researchers also realized that many of these mathematical concepts could be treated within the same optimization framework as regularization and Markov random fields.
With the advent of optimization methods for camera calibration, it was realized that a lot of the ideas were already explored in bundle adjustment theory from the field of photogrammetry.
This decade also marked the first time statistical learning techniques were used in practice to recognize faces in images (see Eigenface).
[11] Recent work has seen the resurgence of feature-based methods used in conjunction with machine learning techniques and complex optimization frameworks.
The accuracy of deep learning algorithms on several benchmark computer vision data sets for tasks ranging from classification,[18] segmentation and optical flow has surpassed prior methods.
Most computer vision systems rely on image sensors, which detect electromagnetic radiation, which is typically in the form of either visible, infrared or ultraviolet light.
Over the last century, there has been an extensive study of eyes, neurons, and brain structures devoted to the processing of visual stimuli in both humans and various animals.
This has led to a coarse yet convoluted description of how natural vision systems operate in order to solve certain vision-related tasks.
Also, some of the learning-based methods developed within computer vision (e.g. neural net and deep learning based image and feature analysis and classification) have their background in neurobiology.
The Neocognitron, a neural network developed in the 1970s by Kunihiko Fukushima, is an early example of computer vision taking direct inspiration from neurobiology, specifically the primary visual cortex.
Applications range from tasks such as industrial machine vision systems which, say, inspect bottles speeding by on a production line, to research into artificial intelligence and computers or robots that can comprehend the world around them.
An example of this is the detection of tumours, arteriosclerosis or other malign changes, and a variety of dental pathologies; measurements of organ dimensions, blood flow, etc.
In this case, automatic processing of the data is used to reduce complexity and to fuse information from multiple sensors to increase reliability.
Fully autonomous vehicles typically use computer vision for navigation, e.g., for knowing where they are or mapping their environment (SLAM), for detecting obstacles.
Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions.
[1][2][3][4] Understanding in this context means the transformation of visual images (the input of the retina) into descriptions of the world that can interface with other thought processes and elicit appropriate action.
For example, they are not good at classifying objects into fine-grained classes, such as the particular breed of dog or species of bird, whereas convolutional neural networks handle this with ease.
The advent of 3D imaging not requiring motion or scanning, and related processing algorithms is enabling rapid advances in this field.
The specific implementation of a computer vision system also depends on whether its functionality is pre-specified or if some part of it can be learned or modified during operation.
[48] There are many kinds of computer vision systems; however, all of them contain these basic elements: a power source, at least one image acquisition device (camera, ccd, etc.
While traditional broadcast and consumer video systems operate at a rate of 30 frames per second, advances in digital signal processing and consumer graphics hardware has made high-speed image acquisition, processing, and display possible for real-time systems on the order of hundreds to thousands of frames per second.
For applications in robotics, fast, real-time video systems are critically important and often can simplify the processing needed for certain algorithms.