Scale space

in this family is referred to as the scale parameter, with the interpretation that image structures of spatial size smaller than about

This framework also allows visual operations to be made scale invariant, which is necessary for dealing with the size variations that may occur in image data, because real-world objects may be of different sizes and in addition the distance between the object and the camera may be unknown and may vary depending on the circumstances.

, details that are significantly smaller than this value are to a large extent removed from the image at scale parameter

In the scale-space literature, a number of different ways have been expressed to formulate this criterion in precise mathematical terms.

In the works,[15][20][21] the uniqueness claimed in the arguments based on scale invariance has been criticized, and alternative self-similar scale-space kernels have been proposed.

The Gaussian kernel is, however, a unique choice according to the scale-space axiomatics based on causality[3] or non-enhancement of local extrema.

Although this connection may appear superficial for a reader not familiar with differential equations, it is indeed the case that the main scale-space formulation in terms of non-enhancement of local extrema is expressed in terms of a sign condition on partial derivatives in the 2+1-D volume generated by the scale space, thus within the framework of partial differential equations.

The motivation for generating a scale-space representation of a given data set originates from the basic observation that real-world objects are composed of different structures at different scales.

This implies that real-world objects, in contrast to idealized mathematical entities such as points or lines, may appear in different ways depending on the scale of observation.

For a computer vision system analysing an unknown scene, there is no way to know a priori what scales are appropriate for describing the interesting structures in the image data.

[9] Another motivation to the scale-space concept originates from the process of performing a physical measurement on real-world data.

In order to extract any information from a measurement process, one has to apply operators of non-infinitesimal size to the data.

In many branches of computer science and applied mathematics, the size of the measurement operator is disregarded in the theoretical modelling of a problem.

Many scale-space operations show a high degree of similarity with receptive field profiles recorded from the mammalian retina and the first stages in the visual cortex.

In these respects, the scale-space framework can be seen as a theoretically well-founded paradigm for early vision, which in addition has been thoroughly tested by algorithms and experiments.

This overall framework has been applied to a large variety of problems in computer vision, including feature detection, feature classification, image segmentation, image matching, motion estimation, computation of shape cues and object recognition.

The set of Gaussian derivative operators up to a certain order is often referred to as the N-jet and constitutes a basic type of feature within the scale-space framework.

Following the idea of expressing visual operations in terms of differential invariants computed at multiple scales using Gaussian derivative operators, we can express an edge detector from the set of points that satisfy the requirement that the gradient magnitude should assume a local maximum in the gradient direction By working out the differential geometry, it can be shown [4] that this differential edge detector can equivalently be expressed from the zero-crossings of the second-order differential invariant that satisfy the following sign condition on a third-order differential invariant: Similarly, multi-scale blob detectors at any given fixed scale[23][9] can be obtained from local maxima and local minima of either the Laplacian operator (also referred to as the Laplacian of Gaussian) or the determinant of the Hessian matrix In an analogous fashion, corner detectors and ridge and valley detectors can be expressed as local maxima, minima or zero-crossings of multi-scale differential invariants defined from Gaussian derivatives.

The theory presented so far describes a well-founded framework for representing image structures at multiple scales.

This need for scale selection originates from two major reasons; (i) real-world objects may have different size, and this size may be unknown to the vision system, and (ii) the distance between the object and the camera can vary, and this distance information may also be unknown a priori.

This algebraic expression for scale normalized Gaussian derivative operators originates from the introduction of

Recent work has shown that also more complex operations, such as scale-invariant object recognition can be performed in this way, by computing local image descriptors (N-jets or local histograms of gradient directions) at scale-adapted interest points obtained from scale-space extrema of the normalized Laplacian operator (see also scale-invariant feature transform[34]) or the determinant of the Hessian (see also SURF);[35] see also the Scholarpedia article on the scale-invariant feature transform[36] for a more general outlook of object recognition approaches based on receptive field responses[19][37][38][39] in terms Gaussian derivative operators or approximations thereof.

When properly constructed, the ratio of the sample rates in space and scale are held constant so that the impulse response is identical in all levels of the pyramid.

A large number of evolution equations have been formulated in this way, motivated by different specific requirements (see the abovementioned book references for further information).

Hence, unexpected artifacts may sometimes occur and one should be very careful of not using the term "scale-space" for just any type of one-parameter family of images.

[4] One motivation for this extension originates from the common need for computing image descriptors subject for real-world objects that are viewed under a perspective camera model.

[4][31][18][19][50] In addition to variabilities over scale, which original scale-space theory was designed to handle, this generalized scale-space theory[19] also comprises other types of variabilities caused by geometric transformations in the image formation process, including variations in viewing direction approximated by local affine transformations, and relative motions between objects in the world and the observer, approximated by local Galilean transformations.

[18][51][52][50][53][54][55][56][57] Regarding biological hearing there are receptive field profiles in the inferior colliculus and the primary auditory cortex that can be well modelled by spectra-temporal receptive fields that can be well modelled by Gaussian derivates over logarithmic frequencies and windowed Fourier transforms over time with the window functions being temporal scale-space kernels.

For discrete data, this kernel can often be numerically well approximated by a small set of first-order recursive filters coupled in cascade, see [71] for further details.

For an earlier approach to handling temporal scales in a time-causal way, by performing Gaussian smoothing over a logarithmically transformed temporal axis, however, not having any known memory-efficient time-recursive implementation as the time-causal limit kernel has, see,[72] When implementing scale-space smoothing in practice there are a number of different approaches that can be taken in terms of continuous or discrete Gaussian smoothing, implementation in the Fourier domain, in terms of pyramids based on binomial filters that approximate the Gaussian or using recursive filters.