In many applications of acoustics and audio signal processing it is necessary to know what humans actually hear. Sound, which consists of air pressure waves, can be accurately measured with sophisticated equipment. However, understanding how these waves are received and mapped into thoughts in the brain is not trivial. Sound is a continuous analog signal which can theoretically contain an infinite amount of information (there being an infinite number of frequencies, each containing both magnitude and phase information.)
Recognizing features important to perception enables scientists and engineers to concentrate on audible features and ignore less important features of the involved system. It is important to note that the question of what humans hear is not only a physiological question of features of the ear but very much also a psychological issue.
Limits of perception
The human ear can usually hear sounds in the range 20 Hz to 22 kHz. With age, the range decreases, especially at the upper limit. Lower frequencies cannot be heard but loud sounds can be felt on the skin.
Frequency resolution of the ear is, in the middle range, about 2 Hz. That is, changes in pitch larger than 2 Hz can be perceived. However, even smaller pitch differences can be perceived through other means. For example, the interference of two pitches can often be heard as a high-frequency buzz.
The intensity range of audible sounds is enormous. The lower limit of audibility is defined to 0 dB, but the upper limit is not as clearly defined. The upper limit is more a question of the limit where the ear will be physically harmed (see also hearing disability). This limit depends also on the time exposed to the sound. Sometimes, the ear can be exposed to short periods of sounds of 120 dB without harm, but long times of 80 dB sounds will harm the ear.
In some situations an otherwise clearly audible sound can be masked by another sound. For example, conversation at a bus stop can be completely impossible if a loud bus is driving past. This phenomenon is called intensity masking. A loud sound will mask a weaker sound so that the weaker sound is inaudible in the presence of the louder sound.
Actually, the masking depends two more parameters: frequency and temporal separation of the sounds. A sound close in frequency to the louder sound is more easily masked than two sounds far apart in frequency. This effect is called pitch masking.
Similarly, a weak sound emitted soon after the end of a louder sound is masked by the louder sound. In fact, even a weak sound just before a louder sound can be masked by the louder sound. These two effects are called forward and backward temporal masking, respectively.
Yet to be done:
- Bark scale, Equivalent rectangular bandwidth (ERB), Mel scale and other scales
- Loudness, that is, percieved volume
- Perception of non-existent sounds, such as, missing fundamental frequency. Compare to telephone which transmits 500Hz to 3500Hz.
- 3D-sound perception