Synchrony capture filterbank: Auditory-inspired signal processing for tracking individual frequency components in speech
Date of Original Version
A processing scheme for speech signals is proposed that emulates synchrony capture in the auditory nerve. The role of stimulus-locked spike timing is important for representation of stimulus periodicity, low frequency spectrum, and spatial location. In synchrony capture, dominant single frequency components in each frequency region impress their time structures on temporal firing patterns of auditory nerve fibers with nearby characteristic frequencies (CFs). At low frequencies, for voiced sounds, synchrony capture divides the nerve into discrete CF territories associated with individual harmonics. An adaptive, synchrony capture filterbank (SCFB) consisting of a fixed array of traditional, passive linear (gammatone) filters cascaded with a bank of adaptively tunable, bandpass filter triplets is proposed. Differences in triplet output envelopes steer triplet center frequencies via voltage controlled oscillators (VCOs). The SCFB exhibits some cochlea-like responses, such as two-tone suppression and distortion products, and possesses many desirable properties for processing speech, music, and natural sounds. Strong signal components dominate relatively greater numbers of filter channels, thereby yielding robust encodings of relative component intensities. The VCOs precisely lock onto harmonics most important for formant tracking, pitch perception, and sound separation. © 2013 Acoustical Society of America.
Journal of the Acoustical Society of America
Kumaresan, Ramdas, Vijay K. Peddinti, and Peter Cariani. "Synchrony capture filterbank: Auditory-inspired signal processing for tracking individual frequency components in speech." Journal of the Acoustical Society of America 133, 6 (2013): 4290-4310. doi:10.1121/1.4802653.