Modulation Filtering

This page informally provides additional examples for work on modulation filtering for time-frequency estimation of audio signals. 


Extraction of Signal Components from an Additive Mixture

Given the time-frequency coefficients of an additive mixture consisting of a stationary sinusoid, an FM  sinusoid and a transient click (left),we can compute the mixture’s modulation spectrum (right)Gspec  GMod
and filter out certain parts that correspond to the orientation in time-frequency of the different components. If these filtered modulations are reconstructed into the time-frequency domain, they yield these shrinkage masks:

Used as weights in conjunction with the persistent empirical Wiener estimator, we obtain the following estimated time-frequency coefficients:

XEstYEst ZEst


 Extraction of a Vibrato from Orchestral Background
A similar (and still very raw) approach can be used for extracting a vibrato singing voice from an orchestral background. The first graphics correspond to the mixture’s spectrogram and modulation spectrum.
The second rows show (from left to right) the modulation filter, the resulting shrinkage mask, the estimated coefficients, the analysis of the reconstruction, and the analysis of the voice extracted by hand in Audiosculpt.
Here are the corresponding sound examples: Mixture //  // Extracted Vibrato  // Residual 
At the beginning of the extraction, there still is some residuum from the strings audible and visible. This could be removed by further adjusting the lambda thresholds. Note that in the residual (signal – extraction) the voice is mainly audible in the beginning when it is not yet modulating in frequency. As soon as the vibrato begins, the contribution of the voice in the extraction begins to become significant (and drastically drops in the residual).


