Optical music recognition

Once captured digitally, the music can be saved in commonly used file formats, e.g. MIDI (for playback) and MusicXML (for page layout).

[6][7] Early research in OMR was conducted by Ichiro Fujinaga, Nicholas Carter, Kia Ng, David Bainbridge, and Tim Bell.

In a library, an OMR system could make music scores searchable[8] and for musicologists it would allow to conduct quantitative musicological studies at scale.

This means that while the alphabet consists of well-defined primitives (e.g., stems, noteheads, or flags), it is their configuration – how they are placed and arranged on the staff – that determines the semantics and how it should be interpreted.

Finally, music notation involves ubiquitous two-dimensional spatial relationships, whereas text can be read as a one-dimensional stream of information, once the baseline is established.

The process of recognizing music scores is typically broken down into smaller steps that are handled with specialized pattern recognition algorithms.

A common problem with that approach is that errors and artifacts that were made in one stage are propagated through the system and can heavily affect the performance.

Donald Byrd and Jakob Simonsen argue that OMR is difficult because modern music notation is extremely complex.

Typical applications for OMR systems include the creation of an audible version of the music score (referred to as replayability).

If the music scores are recognized with the goal of human readability (referred to as reprintability), the structured encoding has to be recovered, which includes precise information on the layout and engraving.

[26] Due to excellent results and modern techniques that made the staff removal stage obsolete, this competition was discontinued.

However, the freely available CVC-MUSCIMA dataset that was developed for this challenge is still highly relevant for OMR research as it contains 1000 high-quality images of handwritten music scores, transcribed by 50 different musicians.

[38] French company Newzik took a different approach in the development of its OMR technology Maestria,[39] by using random score generation.

Some of these products claim extremely high recognition rates with up to 100% accuracy [49][50] but fail to disclose how those numbers were obtained, making it nearly impossible to verify them and compare different OMR systems.

First published digital scan of music scores by David Prerau in 1971
Relation of optical music recognition to other fields of research
Excerpt of Nocturne Op. 15 , no. 2, by Frédéric Chopin – challenges encountered in optical music recognition
Optical Music Recognition Architecture by Bainbridge and Bell (2001)
The general framework for optical music recognition proposed by Ana Rebelo et al. in 2012