Abstract:
Automatic speech recognition (ASR) has lately been a focus consideration
of researchers with respect to more convenient human-computer interaction. Despite
the successful implementation of ASR technology in different languages, employing this
technology in Arabic natural language processing (NLP) applications is limited and constrained
to a small vocabulary such as digits and control commands or a limited set of
words. Therefore, particular attention has been paid to promoting research in this field
to automate man-machine communication. We aim to examine the performance of two
popular ASR engines for identical Arabic speech collection. The ASR engines include
the Carnegie Mellon University (CMU) Sphinx and the Hidden Markov Model Toolkit
(HTK). In fact, performing an ASR task using different recognizers will increase researcher
knowledge regarding which engine is the best fit for particular target applications,
as well as enhancing research in this field. In this paper, an experimental evaluation is
presented for both Sphinx and HTK recognizers using a new “in-house” Arabic continuous
speech corpus that contains a total of 15.93 hours (12.74 training hours and 3.19
testing hours). The vocabulary contains 30,986 words. In these experiments, we used
two text formats, Arabic characters for CMU Sphinx (PocketSphinx decoder) and Roman
characters for HTK (HVite decoder) because HTK expects Roman characters. The
experimental comparison shows that Sphinx outperforms (even in a shorter time) HTK.
In addition, this study demonstrates the intermediate steps followed for models training
including acoustic and language models.