[1][2] Vision processing units are distinct from graphics processing units (which are specialised for video encoding and decoding) in their suitability for running machine vision algorithms such as CNN (convolutional neural networks), SIFT (scale-invariant feature transform) and similar.
They may include direct interfaces to take data from cameras (bypassing any off chip buffers), and have a greater emphasis on on-chip dataflow between many parallel execution units with scratchpad memory, like a manycore DSP.
They are distinct from GPUs, which contain specialised hardware for rasterization and texture mapping (for 3D graphics), and whose memory architecture is optimised for manipulating bitmap images in off-chip memory (reading textures, and modifying frame buffers, with random access patterns).
Target markets are robotics, the internet of things (IoT), new classes of digital cameras for virtual reality and augmented reality, smart cameras, and integrating machine vision acceleration into smartphones and other mobile devices.
These may form a broader category of AI accelerators (to which VPUs may also belong), however as of 2016 there is no consensus on the name: