Visual-to-auditory conversion methods for sensory substitution: Sound spatialization only versus cross-modal correspondence

Visual-to-auditory sensory substitution devices are assistive devices for the blind that convert visual images into soundscapes by mapping visual features with acoustic cues. These systems transform the spatial information of an acquired video stream into artificially spatialized sounds by reproducing natural acoustic cues that humans rely on to localize real sound sources. However, these methods have some drawbacks especially with elevation encoding. That is why the audiovisual cross-modal correspondence between pitch and visual elevation is often used by increasing the sound frequency with increasing elevation. The current study aimed at clearly establishing the potential benefits of using cross-modal correspondence for visual-to-auditory conversion. We investigated in a virtual environment the ability to perceive the location of an object where elevation is either conveyed using a spatialized white noise sound or using a pitch modulation of a spatialized tone. The task of the blindfolded participants was to point to a virtual target using the soundscapes, before and after an audio-motor familiarization with the encoding. Participants localized the target quite accurately even before having been familiarized with the conversion methods. Participants’ performance to localize the elevation position was higher when the conversion method was based on pitch modulation rather than spatialization only. These results suggest a facilitation effect of the cross-modal correspondence between pitch and visual elevation that can benefit to the development of sensory substitution devices relying on pitch modulation.

Voir les publications