Book scanning

To convert the raw images optical character recognition (OCR)[1] is used to turn book pages into a digital text format like ASCII or other similar format, which reduces the file size and allows the text to be reformatted, searched, or processed by other applications.

A non-destructive method is to hold the book in a V-shaped holder and photograph it, rather than lay it flat and scan it.

After scanning, software adjusts the document images by lining it up, cropping it, picture-editing it, and converting it to text and final e-book form.

Scanning at 118 dots/centimeter (300 dpi) is adequate for conversion to digital text output, but for archival reproduction of rare, elaborate or illustrated books, much higher resolution is used.

In 2010 the total number of works appearing as books in human history was estimated to be around 130 million.

Alternatively, due to convenience, safety and technology improvement, many organizations choose to scan in-house by using either overhead scanners which are time-consuming, or digital camera-based scanning machines which are substantially faster and is a method employed by Internet Archive as well as Google.

In the twentieth century, the Hill Museum and Manuscript Library photographed books in Ethiopia that were subsequently destroyed amidst political violence in 1975.

This technique has been successfully employed for tens of thousands of pages of archival original paper scanned for the Riazanov Library digital archive project from newspapers and magazines and pamphlets, varying from 50 to 100 years old and more, and often composed of fragile, brittle paper.

Although the monetary value for some collectors (and for most sellers of this sort of material) is destroyed by unbinding, it in many cases actually greatly assists preservation of the pages, making them more accessible to researchers[1] and less likely to be damaged when subsequently examined.

A disadvantage is that unbound stacks of pages are "fluffed up", and therefore more exposed to oxygen in the air, which may in some cases speed deterioration.

[1] Hand unbinding will preserve text that runs into the gutters of bindings, and most critically allows more easy and complete high quality scans to be made of two-page-wide material, such as center cartoons, graphic art, and photos in magazines.

[2] A large sharpened steel blade which moves straight down cuts the entire length of each sheet in one operation.

A large stack of paper applies torsional forces on the hinge, pulling the blade away from the cutting edge on the table.

Additionally, removing the binding of an entire hardcover book causes excessive wear due to cutting through the cover's stiff backing material.

The entire wood and book package is fed through the table saw using the rip fence as a guide.

Once the paper is liberated from the spine, it can be scanned one sheet at a time using a flatbed scanner or automatic document feeder (ADF).

An ADF which uses a series of rollers and channels to flip sheets over may jam or misfeed when fed coated paper.

Software driven machines and robots have been developed to scan books without the need of unbinding them in order to preserve both the contents of the document and create a digital image archive of its current state.

Indus International, Inc, based in West Salem, Wisconsin, produces scanners which were bought by some US entities for services such as interlibrary loan.

[25] Most high-end commercial robotic scanners use air and suction technology, while some use newer approaches such as bionic fingers for turning pages.

[1][2] With reports of machines being able to scan up to 2,900 pages per hour,[26] robotic book scanners are specifically designed for large-scale digitization projects.

[1] Google's patent 7508978 shows an infrared camera technology which allows detection and automatic adjustment of the three-dimensional shape of the page.

Internet Archive Scribe book scanner in 2011
Internet Archive book scanner
The CZUR M3000 book scanner features a V-shaped cradle that protects books during scanning, ensuring their preservation.
czur of a V-shaped book scanner
Sketch of a typical manual book scanner
Turning the pages in between taking scans
An example of a DIY non-destructive book scanner/digitizer, with the book downwards design, allowing gravity to flatten pages
Non-destructive book scanner with Curve Flattening Technology
Video of the robotic book scanner DL mini
ScanRobot automated scanner with 60° opening angle