Head-related transfer function

A head-related transfer function (HRTF) is a response that characterizes how an ear receives a sound from a point in space.

As sound strikes the listener, the size and shape of the head, ears, ear canal, density of the head, size and shape of nasal and oral cavities, all transform the sound and affect how it is perceived, boosting some frequencies and attenuating others.

It is a transfer function, describing how a sound from a specific point will arrive at the ear (generally at the outer end of the auditory canal).

Some consumer home entertainment products designed to reproduce surround sound from stereo (two-speaker) headphones use HRTFs.

Some forms of HRTF processing have also been included in computer software to simulate surround sound playback from loudspeakers.

Humans have just two ears, but can locate sounds in three dimensions – in range (distance), in direction above and below (elevation), in front and to the rear, as well as to either side (azimuth).

This ability to localize sound sources may have developed in humans and ancestors as an evolutionary necessity since the eyes can only see a fraction of the world around a viewer, and vision is hampered in darkness, while the ability to localize a sound source works in all directions, to varying accuracy,[1] regardless of the surrounding light.

In the AES69-2015 standard,[4] the Audio Engineering Society (AES) has defined the SOFA file format for storing spatially oriented acoustic data like head-related transfer functions (HRTFs).

Even when measured for a "dummy head" of idealized geometry, HRTF are complicated functions of frequency and the three spatial variables.

In order to maximize the signal-to-noise ratio (SNR) in a measured HRTF, it is important that the impulse being generated be of high volume.

[8] Assessing the variation through changes between the person's ear, we can limit our perspective with the degrees of freedom of the head and its relation with the spatial domain.

For the purpose of calibration we are only concerned with the direction level to our ears, ergo a specific degree of freedom.

Setting Y1 = Y2, and solving for X2 yields By observation, the desired transfer function is Therefore, theoretically, if x1(t) is passed through this filter and the resulting x2(t) is played on the headphones, it should produce the same signal at the eardrum.

This process is repeated for many places in the virtual environment to create an array of head-related transfer functions for each position to be recreated while ensuring that the sampling conditions are set by the Nyquist criteria.

For example, a training set of N subjects would consider each HRTF phase and describe a single ITD scaling factor as the average delay of the group.

This computed scaling factor can estimate the time delay as function of the direction and elevation for any given individual.

Two programs are known to do so, both open-source: Mesh2HRTF,[12] which runs physical simulation on a full 3D-mesh of the head, and EAC, which uses a neural network trained from existing HRTFs and works from photo and other rough measurements.

[14] Some vendors like Apple and Sony offer a variety of HRTFs to be selected by the user's ear shape.

[20] Linux is currently unable to directly process any of the proprietary spatial audio (surround plus dynamic objects) formats.

Recent PipeWire versions are also able to provide dynamic spatial rendering using HRTFs,[22] however integration with applications is still in progress.

HRTF filtering effect