w3c/silver

Audio contrast for clarity

10g1k opened this issue · 3 comments

10g1k commented

Good day ladies and gentlemen.

Much work has been conducted relating to visual contrast. I suggest it may be worth adding contrast guidelines for audio content. Some folks (myself among them) have difficulty separating sounds sources. For example, if there is a group of people, and Person A is talking, the sound is clearly discernible; but if Person B starts talking, the sounds merely combine into an unintelligible cacophony. Undoubtedly this applies to some extent to everyone, but I believe it applies to greater degrees to some. It is particularly evident among those on the Autism spectrum and those with specifically hearing related difficulties.

Therefore it may be useful to consider a sound contrast guideline similar to the colour contrast guidelines. Perhaps a functionality which mutes (completely or to a user-specified level) non-focused sounds when a particular sound source receives focus. Focus for sound sources being the same as normal web interaction focus.

10g1k commented

Of course this could be extended for things such as movement and animations. Those not in focus are reduced.

Guideline issues

Aural contrasts are very complicated. It is not as simple as stating that one sound must be some number of decibels above some other sound. The frequencies and dynamic characteristics are also significant factors.

Sound is the perception that requires the most user personalization and configuration, as is evidenced by the multitude of controls on a typical stereo.

An individual's user needs vary significantly, even with "normal" aural function, as different uses process audio differently, with or without impairments that affect frequency sensitivity, and monaural/binaural function.

Also, unlike visual contrast which can be temporally static, sound is always temporal in nature, and the complicated dynamics of a sonic event make an arbitrary "testable" guideline elusive.

GENERALLY:

  • A broadband noise (i.e. white or pink noise) at a given dB SPL will completely mask noises that are at least 40 dB lower in SPL.

    • This means that someone that is easily distracted by outside noises may find help listening to headphones with a calm broadband noise like ocean waves, which can significantly help reduce perception of other noises in the area.
  • Two voices at equal volume but directly mixed on top of on another may be individually discernible by some individuals, but completely indiscernible by others, depending on their neurology related to aural filtering, and cognitive capabilities.

    • A related factor is if the two voices are mixed together to a monaural sum, or if they are presented discretely in the left and right channels and isolated through the use of headphones.
  • Some impairments, such as "tone deafness" mean a person may not be able to distinguish frequencies that are close together, or may not be able to determine if one frequency is higher or lower than another.

    • Therefore, more than just frequency should be used for characterizing different sonic events. (This is a corollary to "more than just hue" should be used to differentiate visual data).
  • As for mixing levels: a 3 dB differentiation is a "noticeable" difference in comparison of two sounds of otherwise similar characteristics, and a 10 dB difference is a "very significant difference". (The threshold of just noticeable difference is typically 1 dB.)

Regarding alerts

  • A sound with a very abrupt start or impulse, i.e. an impact, drum beat, or the breaking of a twig, will standout much more than a sound of equal volume that is otherwise "smooth" or lacking an impulse.

  • Independent sounds at specific frequencies that are less than half an octave apart (i.e. less than a major fifth) in frequency, will tend to be processed as "music" or may sound "musical".

  • To stand out, i.e. to contrast against other sounds, an alert sound or sound that needs to be "noticed" contrasts with a combination of impulse, specific frequency, and intensity (volume).

Regarding speech

  • Dialog should generally be mixed as monaural, i.e. equal volume in left and right, so that it is "centered".

    • This is usually preferred over "panning" dialog to match screen movement, which can be distracting.
    • Off screen dialog may benefit from panning to indicate the relative location off screen, but still "hard" pan (all the way left or right) should be avoided.
    • In a 5+ channel sound system, dialog should most typically be mixed to the center channel, and music mixed stereo for the left & right channels, Doing so reduces interference between music and dialog.
  • Music beds for dialog:

    • music used as a "bed" should be instrumental, and with no vocals in it, as vocals in the music will compete with the dialog.
    • music beds should typically be at least 10 dB lower in average volume than the speech element.
      • Thus if the dialog is mixed at an average VU of -5 dB, the music should be lower than about -15 dB.
      • Different music characteristics will have different needs, for instance music that is very "busy" in the same frequencies as the speech will need to be even lower.

TL;DR

The above is by no means exhaustive, but should indicate some of the important considerations relating to accessible audio.

In the interest of exploring these ideas further, I've set up a repo for the Aural Accessibility Research Project

Aural Accessibility Research Project Logo