Transcription (music)

The French composer Olivier Messiaen transcribed birdsong in the wild, and incorporated it into many of his compositions, for example his Catalogue d'oiseaux for solo piano.

Transcription of this nature involves scale degree recognition and harmonic analysis, both of which the transcriber will need relative or perfect pitch to perform.

The most widely known example of this is Ravel's arrangement for orchestra of Mussorgsky's piano piece Pictures at an Exhibition.

These are sometimes called "piano reductions", because the multiplicity of orchestral parts—in an orchestral piece there may be as many as two dozen separate instrumental parts being played simultaneously—has to be reduced to what a single pianist (or occasionally two pianists, on one or two pianos, such as the different arrangements for George Gershwin's Rhapsody in Blue) can manage to play.

Piano reductions are frequently made of orchestral accompaniments to choral works, for the purposes of rehearsal or of performance with keyboard alone.

This has the same effect as playing a tape or vinyl record at slower speed – the pitch is lowered meaning the music can sound like it is in a different key.

The term "automatic music transcription" was first used by audio researchers James A. Moorer, Martin Piszczalski, and Bernard Galler in 1977.

To date, no software application can yet completely fulfill James Moorer’s definition of automatic music transcription.

Digital Signal Processing is the branch of engineering that provides software engineers with the tools and algorithms needed to analyze a digital recording in terms of pitch (note detection of melodic instruments), and the energy content of un-pitched sounds (detection of percussion instruments).

A Fourier Transform is the mathematical procedure that is used to create the spectrogram from the sound file’s digital data.

Pitch detection upon a monophonic recording was a relatively simple task, and its technology enabled the invention of guitar tuners in the 1970s.

However, pitch detection upon polyphonic music becomes a much more difficult task because the image of its spectrogram now appears as a vague cloud due to a multitude of overlapping comb patterns, caused by each note's multiple harmonics.

Another method of pitch detection was invented by Martin Piszczalski in conjunction with Bernard Galler in the 1970s[2] and has since been widely followed.

[4] The process attempts to roughly mimic the biology of the human inner ear by finding only but a few of the loudest harmonics at a given instant.

To date, the complete note detection of polyphonic recordings remains a mystery to audio engineers, although they continue to make progress by inventing algorithms which can partially detect some of the notes of a polyphonic recording, such as a melody or bass line.

The beat is often a predictable basic unit in time for the musical piece, and may only vary slightly during the performance.

Songs are frequently measured for their Beats Per Minute (BPM) in determining the tempo of the music, whether it be fast or slow.

Despite the intuitive nature of 'foot tapping' of which most humans are capable, developing an algorithm to detect those beats is difficult.

The fast Fourier transform algorithm computes the frequency content of a signal, and is useful in processing musical excerpts.

Note classification and offset detection are based on constant Q transform (CQT) and support vector machines (SVMs).

This in turn leads to a “pitch contour” namely a continuously time-varying line that corresponds to what humans refer to as melody.

In terms of actual computer processing, the principal steps are to 1) digitize the performed, analog music, 2) do successive short-term, fast Fourier transform (FFTs) to obtain the time-varying spectra, 3) identify the peaks in each spectrum, 4) analyze the spectral peaks to get pitch candidates, 5) connect the strongest individual pitch candidates to get the most likely time-varying, pitch contour, 6) map this physical data into the closest music-notation terms.

While time-domain methods have been proposed, they can break down for real-world musical instruments played in typically reverberant rooms.