SpringerOpen Newsletter

Receive periodic news and updates relating to SpringerOpen.

Open Access Research Article

A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation

Mohammad H Radfar1*, Richard M Dansereau2 and Abolghasem Sayadiyan1

Author Affiliations

1 Department of Electrical Engineering, Amirkabir University, Tehran 15875-4413, Iran

2 Department of Systems and Computer Engineering, Carleton University, Ottawa, ON K1S 5B6, Canada

For all author emails, please log on.

EURASIP Journal on Audio, Speech, and Music Processing 2007, 2007:084186  doi:10.1155/2007/84186

The electronic version of this article is the complete one and can be found online at: http://asmp.eurasipjournals.com/content/2007/1/084186


Received:3 March 2006
Revisions received:13 September 2006
Accepted:27 September 2006
Published:16 November 2006

© 2007 Mohammad H. Radfar et al.

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between underdetermined blind source separation techniques and those techniques that model the human auditory system, that is, computational auditory scene analysis (CASA). For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF) of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a maximum likelihood estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.

References

  1. C Jutten, J Herault, Blind separation of sources, part I. An adaptive algorithm based on neuromimetic architecture. Signal Processing 24(1), 1–10 (1991). Publisher Full Text OpenURL

  2. P Comon, Independent component analysis. A new concept? Signal Processing 36(3), 287–314 (1994). Publisher Full Text OpenURL

  3. AJ Bell, TJ Sejnowski, An information-maximization approach to blind separation and blind deconvolution. Neural Computation 7(6), 1129–1159 (1995). PubMed Abstract | Publisher Full Text OpenURL

  4. S-I Amari, J-F Cardoso, Blind source separation-semiparametric statistical approach. IEEE Transactions on Signal Processing 45(11), 2692–2700 (1997). Publisher Full Text OpenURL

  5. AS Bregman, Auditory Scene Analysis (MIT Press, Cambridge, Mass, USA, 1994)

  6. GJ Brown, M Cooke, Computational auditory scene analysis. Computer Speech and Language 8(4), 297–336 (1994). Publisher Full Text OpenURL

  7. M Cooke, DPW Ellis, The auditory organization of speech and other sources in listeners and computational models. Speech Communication 35(3-4), 141–177 (2001). Publisher Full Text OpenURL

  8. DPW Ellis, Using knowledge to organize sound: the prediction-driven approach to computational auditory scene analysis and its application to speech/nonspeech mixtures. Speech Communication 27(3-4), 281–298 (1999). Publisher Full Text OpenURL

  9. T Nakatani, HG Okuno, Harmonic sound stream segregation using localization and its application to speech stream segregation. Speech Communication 27(3), 209–222 (1999). Publisher Full Text OpenURL

  10. GJ Brown, DL Wang, Separation of speech by computational auditory scene analysis. in Speech Enhancement: What's New?, ed. by Benesty J, Makino S, Chen J (Springer, New York, NY, USA, 2005), pp. 371–402 PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  11. CJ Darwin, RP Carlyon, Auditory grouping. in The Handbook of Perception and Cognition, vol. 6, ed. by Moore BCJ (Academic Press, Orlando, Fla, USA, 1995), pp. 387–424 chapter Hearing OpenURL

  12. DL Wang, GJ Brown, Separation of speech from interfering sounds based on oscillatory correlation. IEEE Transactions on Neural Networks 10(3), 684–697 (1999). PubMed Abstract | Publisher Full Text OpenURL

  13. G Hu, DL Wang, Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Transactions on Neural Networks 15(5), 1135–1150 (2004). PubMed Abstract | Publisher Full Text OpenURL

  14. GJ Jang, TW Lee, A probabilistic approach to single channel blind signal separation. Proceedings of Advances in Neural Information Processing Systems (NIPS '02), December 2002, Vancouver, British Columbia, Canada, 1173–1180

  15. C Fevotte, SJ Godsill, A Bayesian approach for blind separation of sparse sources. IEEE Transaction on Speech and Audio Processing 4(99), 1–15 (2005)

  16. M Girolami, A variational method for learning sparse and overcomplete representations. Neural Computation 13(11), 2517–2532 (2001). PubMed Abstract | Publisher Full Text OpenURL

  17. T-W Lee, MS Lewicki, M Girolami, TJ Sejnowski, Blind source separation of more sources than mixtures using overcomplete representations. IEEE Signal Processing Letters 6(4), 87–90 (1999). Publisher Full Text OpenURL

  18. T Beierholm, BD Pedersen, O Winther, Low complexity Bayesian single channel source separation. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 5, 529–532

  19. S Roweis, One microphone source separation. Proceedings of Advances in Neural Information Processing Systems (NIPS '00), October-November 2000, Denver, Colo, USA, 793–799

  20. MJ Reyes-Gomez, DPW Ellis, N Jojic, Multiband audio modeling for single-channel acoustic source separation. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 5, 641–644

  21. AM Reddy, B Raj, A minimum mean squared error estimator for single channel speaker separation. Proceedings of the 8th International Conference on Spoken Language Processing (INTERSPEECH '04), October 2004, Jeju Island, Korea, 2445–2448

  22. T Kristjansson, H Attias, J Hershey, Single microphone source separation using high resolution signal reconstruction. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004, Montreal, Quebec, Canada 2, 817–820

  23. ST Rowies, Factorial models and refiltering for speech separation and denoising. Proceedings of the 8th European Conference on Speech Communication and Technology (EUROSPEECH '03), September 2003, Geneva, Switzerland 7, 1009–1012

  24. T Virtanen, A Klapuri, Separation of harmonic sound sources using sinusoidal modeling. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 2, 765–768

  25. TF Quatieri, RG Danisewicz, An approach to co-channel talker interference suppression using a sinusoidal model for speech. IEEE Transactions on Acoustics, Speech, and Signal Processing 38(1), 56–69 (1990). Publisher Full Text OpenURL

  26. EA Wan, AT Nelson, Neural dual extended Kalman filtering: applications in speech enhancement and monaural blind signal separation. Proceedings of the 7th IEEE Workshop on Neural Networks for Signal Processing (NNSP '97), September 1997, Amelia Island, Fla, USA, 466–475

  27. JR Hopgood, PJW Rayner, Single channel nonstationary stochastic signal separation using linear time-varying filters. IEEE Transactions on Signal Processing 51(7), 1739–1752 (2003). Publisher Full Text OpenURL

  28. R Balan, A Jourjine, J Rosca, AR processes and sources can be reconstructed from degenerative mixtures. Proceedings of the 1st International Workshop on Independent Component Analysis and Signal Separation (ICA '99), January 1999, Aussois, France, 467–472

  29. J Rouat, YC Liu, D Morissette, A pitch determination and voiced/unvoiced decision algorithm for noisy speech. Speech Communication 21(3), 191–207 (1997). Publisher Full Text OpenURL

  30. D Chazan, Y Stettiner, D Malah, Optimal multi-pitch estimation using the EM algorithm for co-channel speech separation. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '93), April 1993, Minneapolis, Minn, USA 2, 728–731

  31. M Wu, DL Wang, GJ Brown, A multipitch tracking algorithm for noisy speech. IEEE Transactions on Speech and Audio Processing 11(3), 229–241 (2003). Publisher Full Text OpenURL

  32. T Nishimoto, S Sagayama, H Kameoka, Multi-pitch trajectory estimation of concurrent speech based on harmonic GMM and nonlinear Kalman filtering. Proceedings of the 8th International Conference on Spoken Language Processing (INTERSPEECH '04), October 2004, Jeju Island, Korea 1, 2433–2436

  33. T Tolonen, M Karjalainen, A computationally efficient multipitch analysis model. IEEE Transactions on Speech and Audio Processing 8(6), 708–716 (2000). Publisher Full Text OpenURL

  34. Y-H Kwon, D-J Park, B-C Ihm, Simplified pitch detection algorithm of mixed speech signals. Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '00), May 2000, Geneva, Switzerland 3, 722–725

  35. DP Morgan, EB George, LT Lee, SM Kay, Cochannel speaker separation by harmonic enhancement and suppression. IEEE Transactions on Speech and Audio Processing 5(5), 407–424 (1997). Publisher Full Text OpenURL

  36. MH Radfar, RM Dansereau, A Sayadiyan, Performance evaluation of three features for model-based single channel speech separation problem. Proceedings of the 9th International Conference on Spoken Language Processing (INTERSPEECH '06), September 2006, Pittsburgh, Pa, USA, 2610–2613

  37. G Hu, D Wang, Auditory segmentation based on onset and offset analysis to appear in IEEE Transactions on Audio, Speech, and Language Processing

  38. D Ellis, Model-based scene analysis. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, ed. by Wang D, Brown G (Wiley/IEEE Press, New York, NY, USA, 2006)

  39. TW Parsons, Separation of speech from interfering speech by means of harmonic selection. Journal of the Acoustical Society of America 60(4), 911–918 (1976). Publisher Full Text OpenURL

  40. A de Cheveigné, H Kawahara, Multiple period estimation and pitch perception model. Speech Communication 27(3), 175–185 (1999). Publisher Full Text OpenURL

  41. M Weintraub, A computational model for separating two simultaneous talkers. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '86), April 1986, Tokyo, Japan 11, 81–84

  42. BA Hanson, DY Wong, The harmonic magnitude suppression (HMS) technique for intelligibility enhancement in the presence of interfering speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '84), March 1984, San Diego, Calif, USA 2, 18A. 5. 1–18A. 5. 4

  43. PP Kanjilal, S Palit, Extraction of multiple periodic waveforms from noisy data. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '94), April 1994, Adelaide, SA, Australia 2, 361–364

  44. MR Every, JE Szymanski, Separation of synchronous pitched notes by spectral filtering of harmonics. IEEE Transactions on Audio, Speech and Language Processing 14(5), 1845–1856 (2006)

  45. RC Maher, JW Beauchamp, Fundamental frequency estimation of musical signals using a two-way mismatch procedure. Journal of the Acoustical Society of America 95(4), 2254–2263 (1994). Publisher Full Text OpenURL

  46. M Karjalainen, T Tolonen, Multi-pitch and periodicity analysis model for sound separation and auditory scene analysis. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '99), March 1999, Phoenix, Ariz, USA 2, 929–932

  47. M Cooke, in Modeling auditory processing and organization, Doctoral thesis

  48. RJ McAulay, TF Quatieri, Sinusoidal coding. Speech Coding and Synthesis, ed. by Kleijn W, Paliwal K (Elsevier, New York, NY, USA, 1995)

  49. TF Quatieri, Discrete-Time Speech Signal Processing Principle and Practice (Prentice-Hall, Englewood Cliffs, NJ, USA, 2001)

  50. E Yair, Y Medan, D Chazan, Super resolution pitch determination of speech signals. IEEE Transactions on Signal Processing 39(1), 40–48 (1991). Publisher Full Text OpenURL

  51. P Martin, Comparison of pitch detection by cepstrum and spectral comb analysis. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '82), May 1982, Paris, France 7, 180–183

  52. R Meddis, M Hewitt, Virtual pitch and phase sensitivity of a computer model of the auditory periphery I: pitch identification. Journal of the Acoustical Society of America 89(6), 2866–2882 (1991). Publisher Full Text OpenURL

  53. R Meddis, L O'Mard, A unitary model of pitch perception. Journal of the Acoustical Society of America 102(3), 1811–1820 (1997). PubMed Abstract | Publisher Full Text OpenURL

  54. N Chandra, RE Yantorno, Usable speech detection using the modified spectral autocorrelation peak to valley ratio using the LPC residual. Proceedings of 4th IASTED International Conference on Signal and Image Processing, August 2002, Kaua'i Marriott, Hawaii, USA, 146–149

  55. YA Mahgoub, RM Dansereau, Voicing-state classification of co-channel speech using nonlinear state-space reconstruction. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '05), March 2005, Philadelphia, Pa, USA 1, 409–412

  56. AR Kizhanatham, N Chandra, RE Yantorno, Co-channel speech detection approaches using cyclostationarity or wavelet transform. Proceedings of 4th IASTED International Conference on Signal and Image Processing, August 2002, Kaua'i Marriott, Hawaii, USA

  57. DS Benincasa, MI Savic, Voicing state determination of co-channel speech. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '98), May 1998, Seattle, Wash, USA 2, 1021–1024

  58. MH Radfar, RM Dansereau, A Sayadiyan, A joint identification-separation technique for single channel speech separation. Proceedings of the 12th IEEE Digital Signal Processing Workshop (DSP '06), September 2006, Grand Teton National Park, Wyo, USA, 76–81

  59. MH Radfar, A Sayadiyan, RM Dansereau, A new algorithm for two-talker pitch tracking in single channel paradigm. Proceedings of International Conference on Signal Processing (ICSP '06), November 2006, Guilin, China

  60. A Nadas, D Nahamoo, MA Picheny, Speech recognition using noise-adaptive prototypes. IEEE Transactions on Acoustics, Speech, and Signal Processing 37(10), 1495–1503 (1989). Publisher Full Text OpenURL

  61. KK Paliwal, LD Alsteris, On the usefulness of STFT phase spectrum in human listening tests. Speech Communication 45(2), 153–170 (2005). Publisher Full Text OpenURL

  62. DB Paul, The spectral envelope estimation vocoder. IEEE Transactions on Acoustics, Speech, and Signal Processing 29(4), 786–794 (1981). Publisher Full Text OpenURL

  63. C de Boor, A Practical Guide to Splines (Springer, New York, NY, USA, 1978)

  64. D Talkin, A robust algorithm for pitch tracking (RAPT). in Speech Coding and Synthesis, ed. by Kleijn W, Paliwal K (Elsevier, Amsterdam, The Netherlands, 1995), pp. 495–518

  65. A Gersho, RM Gray, Vector Quantization and Signal Compression (Kluwer Academic, Norwell, Mass, USA, 1992)

  66. WC Chu, Vector quantization of harmonic magnitudes in speech coding applications - a survey and new technique. EURASIP Journal on Applied Signal Processing 2004(17), 2601–2613 (2004). Publisher Full Text OpenURL

  67. D Wang, On ideal binary mask as the computational goal of auditory scene analysis. in Speech Separation by Humans and Machines, ed. by Divenyi P (Kluwer Academic, Norwell, Mass, USA, 2005), pp. 181–197

  68. JA Naylor, SF Boll, Techniques for suppression of an interfering talker in co-channel speech. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '87), April 1987, Dallas, Tex, USA 1, 205–208