PERFORMANCE EVALUATION OF SPHINX AND HTK SPEECH RECOGNIZERS FOR SPOKEN ARABIC LANGUAGE

Al-Anzi, Fawaz; AbuZeina, Dia

PERFORMANCE EVALUATION OF SPHINX AND HTK SPEECH RECOGNIZERS FOR SPOKEN ARABIC LANGUAGE

Al-Anzi, Fawaz; AbuZeina, Dia

URI: http://localhost:8080/xmlui/handle/123456789/8226

Date: 2019-06-03

Abstract:

Automatic speech recognition (ASR) has lately been a focus consideration of researchers with respect to more convenient human-computer interaction. Despite the successful implementation of ASR technology in different languages, employing this technology in Arabic natural language processing (NLP) applications is limited and constrained to a small vocabulary such as digits and control commands or a limited set of words. Therefore, particular attention has been paid to promoting research in this field to automate man-machine communication. We aim to examine the performance of two popular ASR engines for identical Arabic speech collection. The ASR engines include the Carnegie Mellon University (CMU) Sphinx and the Hidden Markov Model Toolkit (HTK). In fact, performing an ASR task using different recognizers will increase researcher knowledge regarding which engine is the best fit for particular target applications, as well as enhancing research in this field. In this paper, an experimental evaluation is presented for both Sphinx and HTK recognizers using a new “in-house” Arabic continuous speech corpus that contains a total of 15.93 hours (12.74 training hours and 3.19 testing hours). The vocabulary contains 30,986 words. In these experiments, we used two text formats, Arabic characters for CMU Sphinx (PocketSphinx decoder) and Roman characters for HTK (HVite decoder) because HTK expects Roman characters. The experimental comparison shows that Sphinx outperforms (even in a shorter time) HTK. In addition, this study demonstrates the intermediate steps followed for models training including acoustic and language models.

Show full item record