Saliency map

[1] The goal of a saliency map is to reflect the degree of importance of a pixel to the human visual system or an otherwise opaque ML model.

For example, in this image, a person first looks at the fort and light clouds, so they should be highlighted on the saliency map.

Saliency maps engineered in artificial or computer vision are typically not the same as the actual saliency map constructed by biological or natural vision.

Some general applications: Saliency estimation may be viewed as an instance of image segmentation.

The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze.

Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.)

It is based on the idea that the true edges, i.e. object contours, are more salient than the other complex textured regions.

So, it obtains 4 binary maps for vertical, horizontal and two diagonal directions.

A threshold of size of connected pixel set is used to determine whether an image block contains a perceivable edge (salient region) or not.

Then we calculate the color distance of each pixel, this process we call it contract function.

The saliency dataset usually contains human eye movements on some image sequences.

The most valuable dataset parameters are spatial resolution, size, and eye-tracking equipment.

To collect a saliency dataset, image or video sequences and eye-tracking equipment must be prepared, and observers must be invited.

Then the session started, and saliency data are collected by showing sequences and recording eye gazes.

The eye-tracking device is a high-speed camera, capable of recording eye movements at least 250 frames per second.

Images from the camera are processed by the software, running on a dedicated computer returning gaze data.