Embedded Knowledge-based Speech Detectors for Real-Time Recognition Tasks

Salvatore Vitabile, Filippo Sorbello, Salvatore Andolina, Sabato Marco Siniscalchi, Antonio Gentile, Sabato M. Siniscalchi, Salvatore Vitabile, Filippo Sorbello, Francesca Gennaro

Risultato della ricerca: Other

4 Citazioni (Scopus)

Abstract

Speech recognition has become common in many application domains, from dictation systems for professional practices to vocal user interfaces for people with disabilities or hands-free system control. However, so far the performance of automatic speech recognition (ASR) systems are comparable to human speech recognition (HSR) only under very strict working conditions, and in general much lower. Incorporating acoustic-phonetic knowledge into ASR design has been proven a viable approach to raise ASR accuracy. Manner of articulation attributes such as vowel, stop, fricative, approximant, nasal, and silence are examples of such knowledge. Neural networks have already been used successfully as detectors for manner of articulation attributes starting from representations of speech signal frames. In this paper, the full system implementation is described. The system has a first stage for MFCC extraction followed by a second stage implementing a sinusoidal based multi-layer perceptron for speech event classification. Implementation details over a Celoxica RC203 board are given
Lingua originaleEnglish
Pagine353-360
Numero di pagine8
Stato di pubblicazionePublished - 2006

Fingerprint

Speech recognition
Detectors
Speech analysis
Multilayer neural networks
User interfaces
Acoustics
Neural networks
Control systems

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cita questo

Embedded Knowledge-based Speech Detectors for Real-Time Recognition Tasks. / Vitabile, Salvatore; Sorbello, Filippo; Andolina, Salvatore; Siniscalchi, Sabato Marco; Gentile, Antonio; Siniscalchi, Sabato M.; Vitabile, Salvatore; Sorbello, Filippo; Gennaro, Francesca.

2006. 353-360.

Risultato della ricerca: Other

@conference{6f7fe28d27af4bebbfb22ed4beaed0a2,
title = "Embedded Knowledge-based Speech Detectors for Real-Time Recognition Tasks",
abstract = "Speech recognition has become common in many application domains, from dictation systems for professional practices to vocal user interfaces for people with disabilities or hands-free system control. However, so far the performance of automatic speech recognition (ASR) systems are comparable to human speech recognition (HSR) only under very strict working conditions, and in general much lower. Incorporating acoustic-phonetic knowledge into ASR design has been proven a viable approach to raise ASR accuracy. Manner of articulation attributes such as vowel, stop, fricative, approximant, nasal, and silence are examples of such knowledge. Neural networks have already been used successfully as detectors for manner of articulation attributes starting from representations of speech signal frames. In this paper, the full system implementation is described. The system has a first stage for MFCC extraction followed by a second stage implementing a sinusoidal based multi-layer perceptron for speech event classification. Implementation details over a Celoxica RC203 board are given",
author = "Salvatore Vitabile and Filippo Sorbello and Salvatore Andolina and Siniscalchi, {Sabato Marco} and Antonio Gentile and Siniscalchi, {Sabato M.} and Salvatore Vitabile and Filippo Sorbello and Francesca Gennaro",
year = "2006",
language = "English",
pages = "353--360",

}

TY - CONF

T1 - Embedded Knowledge-based Speech Detectors for Real-Time Recognition Tasks

AU - Vitabile, Salvatore

AU - Sorbello, Filippo

AU - Andolina, Salvatore

AU - Siniscalchi, Sabato Marco

AU - Gentile, Antonio

AU - Siniscalchi, Sabato M.

AU - Vitabile, Salvatore

AU - Sorbello, Filippo

AU - Gennaro, Francesca

PY - 2006

Y1 - 2006

N2 - Speech recognition has become common in many application domains, from dictation systems for professional practices to vocal user interfaces for people with disabilities or hands-free system control. However, so far the performance of automatic speech recognition (ASR) systems are comparable to human speech recognition (HSR) only under very strict working conditions, and in general much lower. Incorporating acoustic-phonetic knowledge into ASR design has been proven a viable approach to raise ASR accuracy. Manner of articulation attributes such as vowel, stop, fricative, approximant, nasal, and silence are examples of such knowledge. Neural networks have already been used successfully as detectors for manner of articulation attributes starting from representations of speech signal frames. In this paper, the full system implementation is described. The system has a first stage for MFCC extraction followed by a second stage implementing a sinusoidal based multi-layer perceptron for speech event classification. Implementation details over a Celoxica RC203 board are given

AB - Speech recognition has become common in many application domains, from dictation systems for professional practices to vocal user interfaces for people with disabilities or hands-free system control. However, so far the performance of automatic speech recognition (ASR) systems are comparable to human speech recognition (HSR) only under very strict working conditions, and in general much lower. Incorporating acoustic-phonetic knowledge into ASR design has been proven a viable approach to raise ASR accuracy. Manner of articulation attributes such as vowel, stop, fricative, approximant, nasal, and silence are examples of such knowledge. Neural networks have already been used successfully as detectors for manner of articulation attributes starting from representations of speech signal frames. In this paper, the full system implementation is described. The system has a first stage for MFCC extraction followed by a second stage implementing a sinusoidal based multi-layer perceptron for speech event classification. Implementation details over a Celoxica RC203 board are given

UR - http://hdl.handle.net/10447/15503

M3 - Other

SP - 353

EP - 360

ER -