Arabic Part of Speech Tagging by Using the Stanford System: Prepositions as a Case Study

AbuZeina, Dia; Al-Tamimi, Taqieddin

dc.contributor.author	AbuZeina, Dia
dc.contributor.author	Al-Tamimi, Taqieddin
dc.date.accessioned	2021-05-09T08:08:14Z
dc.date.accessioned	2022-05-22T08:54:44Z
dc.date.available	2021-05-09T08:08:14Z
dc.date.available	2022-05-22T08:54:44Z
dc.date.issued	2021-05-01
dc.identifier.uri	http://localhost:8080/xmlui/handle/123456789/8317
dc.description.abstract	This paper discusses part of speech (PoS) tagging for Arabic prepositions. Arabic has a number of predefined sets of particles such as particles of Nasb, particles of Jazm, particles of Jarr (also called prepositions), etc. Each set has a particular role in the context in which it appears. In general, PoS is the process of assigning a tag for each word (e.g. name, verb, particle, etc.) based on the context. In fact, PoS is a beneficial tool for many natural language processing (NLP) toolkits. For instance, it is used in syntactic parsing to validate the grammar of the sentence in question. It is also beneficial to understand the required meaning via textual analysis for further processing in search engines. Many other language processing applications utilize PoS such as machine translation, speech synthesis, speech recognition, diacritization, etc. Hence, the performance quality of many NLP applications depends on the accuracy of outputs of the used tagging system. Hence, this study examines the Stanford tagger to explore its tag set in the text under examination and its performance for tagging Arabic prepositions. This study also discusses the weaknesses of the Stanford tagger, as it does not handle the merging case when a preposition joins with an adjacent word to form one single word. Another concern of the Stanford tagger is that it gives a unique tag for different particles such as Jarr and Jazm in terms of linguistic functions. Through our inductive study of prepositions in terms of linguistic functions such as Jazm and Istifham (interrogation), we did not note differences in tagging prepositions like “to” ( )إلى and “in” ()في . Other prepositions are also difficult to distinguish unless they are contextualized; these include “until” ()حتى and “except” ()عدا . This shows that this tagging system is inaccurate and the need for keeping up with tagging-related systems is vital, hence is the significance of our research. In this work, we used the Holy Quran to identify the performance of the Stanford System in tagging prepositions in the Quran. This work encourages more research on tagging other Arabic prepositions to explore the compatibility of tagging symbols employed in the Stanford System and prepositions used in the Arabic language, in general.	en_US
dc.language.iso	ar	en_US
dc.publisher	Elsevier	en_US
dc.subject	natural language processing (NLP), Arabic, part of speech, tagging, prepositions, syntactic category, Holy Quran data set	en_US
dc.title	Arabic Part of Speech Tagging by Using the Stanford System: Prepositions as a Case Study	en_US
dc.type	Article	en_US