online read us now
Paper details
Number 2 - June 2014
Volume 24 - 2014
Automatic speech signal segmentation based on the innovation adaptive filter
Ryszard Makowski, Robert Hossa
Abstract
Speech segmentation is an essential stage in designing automatic speech recognition systems and one can find several
algorithms proposed in the literature. It is a difficult problem, as speech is immensely variable. The aim of the authors’
studies was to design an algorithm that could be employed at the stage of automatic speech recognition. This would make
it possible to avoid some problems related to speech signal parametrization. Posing the problem in such a way requires the
algorithm to be capable of working in real time. The only such algorithm was proposed by Tyagi et al., (2006), and it is
a modified version of Brandt’s algorithm. The article presents a new algorithm for unsupervised automatic speech signal
segmentation. It performs segmentation without access to information about the phonetic content of the utterances, relying
exclusively on second-order statistics of a speech signal. The starting point for the proposed method is time-varying Schur
coefficients of an innovation adaptive filter. The Schur algorithm is known to be fast, precise, stable and capable of rapidly
tracking changes in second order signal statistics. A transfer from one phoneme to another in the speech signal always
indicates a change in signal statistics caused by vocal track changes. In order to allow for the properties of human hearing,
detection of inter-phoneme boundaries is performed based on statistics defined on the mel spectrum determined from the
reflection coefficients. The paper presents the structure of the algorithm, defines its properties, lists parameter values,
describes detection efficiency results, and compares them with those for another algorithm. The obtained segmentation
results, are satisfactory.
Keywords
automatic speech segmentation, inter-phoneme boundaries, Schur adaptive filtering, detection threshold determination