Document mosaicing is a process that stitches multiple, overlapping snapshot images of a document together to produce one large, high resolution composite.
The document is slid under a stationary, over-the-desk camera by hand until all parts of the document are snapshotted by the camera's field of view.
[1] The document mosaicing can be divided into four main processes.
In this process, the motion of the document slid under the camera is coarsely tracked by the system.
In the first frame of snapshots, a small patch is extracted from the center of the image as a correlation template.
The correlation process is performed in the four times size of the patch area of the next frame.
The motion of the paper is indicated by the peak in the correlation function.
The peak in the correlation function indicates the motion of the paper.
The snapshots are stored in an ordered list to facilitate pairing the overlapped images in later processes.
Feature detection is the process of finding the transformation that aligns one image with another.
Skew angle estimation and columns, lines and words finding are the examples of feature detection operations.
A small patch of text in the image is selected randomly and then rotated in the range of ±20° until the variance of the pixel intensities of the patch summed along the raster lines is maximised.
[4] To ensure that the found skew angle is accurate, the document mosaic system performs calculation at many image patches and derive the final estimation by finding the average of the individual angles weighted by the variance of the pixel intensities of each patch.
In this operation, the de-skewed document is intuitively segmented into a hierarchy of columns, lines and words.
These segmentations are important because the document mosaic is created by matching the lower right corners of words in overlapping images pair.
Moreover, the segmentation operation can organize the list of images in the context of a hierarchy of rows and column reliably.
The segmentation operation involves a considerable amount of summing in the binary gradient, de-skewed images, which done by construct a matrix of partial sums[6] whose elements are given by
The matrix of partial sums is calculated in one pass through the binary gradient, de-skewed image.
The two images are now organized in hierarchy of linked lists in following structure : At the bottom of the structure, the length of each word is recorded for establishing correspondence between two images to reduce to search only the corresponding structures for the groups of words with the matching lengths.
After finishing a seed match finding operation, the next process is to build the match list to generate the correspondences points of the two images.
Assuming a pinhole camera model, the transformation between pixels (u,v) of image 1 and pixels (u0, v0) of image 2 is demonstrated by a plane-to-plane projectivity.
The parameters of the projectivity is found from four pairs of matching points.
The projectivity is fine-tuned using correlation at the corners of the overlapping portion to obtain four correspondences to sub-pixel accuracy.
The typical result of the process is shown in Figure 5.
Finally, the whole page composition is built up by mapping all the images into the coordinate system of an "anchor" image, which is normally the one nearest the page center.
The raw document mosaic is shown in Figure 6.
This problem can be solved by performing Hierarchical sub-mosaics.
There are various areas that the technique of document mosaicing can be applied to such as :