Abstract:
The part of speech (PoS) tagging is a core component in many natural language processing (NLP) applications. In fact, the
PoS taggers contribute as a preprocessing step in various NLP tasks, such as syntactic parsing, information extraction, machine
translation, and speech synthesis. In this paper, we examine the performance of a modern standard Arabic (MSA) based tagger
for the classical (i.e., traditional or historical) Arabic. In this work, we employed the Stanford Arabic model tagger to evaluate
the imperative verbs in the Holy Quran. In fact, the Stanford tagger contains 29 tags; however, this work experimentally evaluates
just one that is the VB ≡ imperative verb.The testing set contains 741 imperative verbs, which appear in 1,848 positions in the Holy
Quran. Despite the previously reported accuracy of the Arabic model of the Stanford tagger, which is 96.26% for all tags and 80.14%
for unknown words, the experimental results show that this accuracy is only 7.28% for the imperative verbs. This result promotes
the need for further research to expose why the tagging is severely inaccurate for classical Arabic. The performance decline might
be an indication of the necessity to distinguish between training data for both classical and MSA Arabic for NLP tasks.