Rigid motion segmentation

In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time.

The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it.

Depending on the segmentation criterion used in the algorithm it can be broadly classified into the following categories: image difference, statistical methods, wavelets, layering, optical flow and factorization.

Rigid motion segmentation has found an increase in its application over the recent past with rise in surveillance and video editing.

This motion (3-D) in time when captured by a camera (2-D) corresponds to change of pixels in the subsequent frames of the video sequence.

Depending upon the type of visual features that are extracted, motion segmentation algorithms can be broadly divided into two categories.

The second category of algorithms computes a set of features corresponding to actual physical points on the objects.

As mentioned earlier that there is no particular way to distinguish Motion Segmentation techniques but depending on the basis of the segmentation criterion used in the algorithm it can be broadly classified as follows:[2] It is a very useful technique for detecting changes in images due to its simplicity and ability to deal with occlusion and multiple motions.

Using this contour it extracts the spatial and temporal information required to define the motion in the scene.

Most commonly used frameworks are maximum a posteriori probability (MAP),[7] Particle Filter (PF)[8] and Expectation Maximization (EM).

[9] MAP uses Bayes' Rule for implementation where a particular pixel has to be classified under predefined classes.

Optical flow (OF) helps in determining the relative pixel velocity of points within an image sequence.

Initially the main drawback of OF was the lack of robustness to noise and high computational costs but due to recent key-point matching techniques and hardware implementations, these limitations have diminished.

To increase its robustness to occlusion and temporal stopping, OF is generally used with other statistical or image difference techniques.

Though this method provides good results,[12] it is limited with an assumption that the movement of objects is only in front of the camera.

Implementations of Wavelet-based techniques are present with other approaches, such as optical flow and are applied at various scale to reduce the effect of noise.

[13] As humans also use layer based segmentation, this method is a natural solution to occlusion problems but it is very complex with requirement of manual tuning.

This technique factorized the trajectory matrix W, determined after the tracking of different features over the sequence into two matrices: motion and structure using Singular Value Decomposition.

Further motion detection algorithms can also be classified depending upon the number of views: two and multi view-based approaches namely.

Consider two perspective camera views of a rigid body and find its feature correspondences.

[18] A number of approaches have been provided which include Principle Angles Configuration (PAC)[19] and Sparse Subspace Clustering (SSC)[20] methods.

[22] These algorithms are faster and more accurate than the two-view based but require greater number of frames to maintain the accuracy.

Motion segmentation is a field under research as there are many issues which provide scope of improvement.

There are strong feature detection algorithms but they still give false positives which can lead to unexpected correspondences.

The presence of image noise and outliers further affect the accuracy of structure from motion (SFM) estimation.

Some of the other problems faced are: Robust algorithms have been proposed to take care of the outliers and implement with greater accuracy.