DSpace Repository

Arabic Part of Speech Tagging by Using the Stanford System: Prepositions as a Case Study

Show simple item record

dc.contributor.author AbuZeina, Dia
dc.contributor.author Al-Tamimi, Taqieddin
dc.date.accessioned 2021-05-09T08:08:14Z
dc.date.accessioned 2022-05-22T08:54:44Z
dc.date.available 2021-05-09T08:08:14Z
dc.date.available 2022-05-22T08:54:44Z
dc.date.issued 2021-05-01
dc.identifier.uri http://localhost:8080/xmlui/handle/123456789/8317
dc.description.abstract This paper discusses part of speech (PoS) tagging for Arabic prepositions. Arabic has a number of predefined sets of particles such as particles of Nasb, particles of Jazm, particles of Jarr (also called prepositions), etc. Each set has a particular role in the context in which it appears. In general, PoS is the process of assigning a tag for each word (e.g. name, verb, particle, etc.) based on the context. In fact, PoS is a beneficial tool for many natural language processing (NLP) toolkits. For instance, it is used in syntactic parsing to validate the grammar of the sentence in question. It is also beneficial to understand the required meaning via textual analysis for further processing in search engines. Many other language processing applications utilize PoS such as machine translation, speech synthesis, speech recognition, diacritization, etc. Hence, the performance quality of many NLP applications depends on the accuracy of outputs of the used tagging system. Hence, this study examines the Stanford tagger to explore its tag set in the text under examination and its performance for tagging Arabic prepositions. This study also discusses the weaknesses of the Stanford tagger, as it does not handle the merging case when a preposition joins with an adjacent word to form one single word. Another concern of the Stanford tagger is that it gives a unique tag for different particles such as Jarr and Jazm in terms of linguistic functions. Through our inductive study of prepositions in terms of linguistic functions such as Jazm and Istifham (interrogation), we did not note differences in tagging prepositions like “to” ( )إلى and “in” ()في . Other prepositions are also difficult to distinguish unless they are contextualized; these include “until” ()حتى and “except” ()عدا . This shows that this tagging system is inaccurate and the need for keeping up with tagging-related systems is vital, hence is the significance of our research. In this work, we used the Holy Quran to identify the performance of the Stanford System in tagging prepositions in the Quran. This work encourages more research on tagging other Arabic prepositions to explore the compatibility of tagging symbols employed in the Stanford System and prepositions used in the Arabic language, in general. en_US
dc.language.iso ar en_US
dc.publisher Elsevier en_US
dc.subject natural language processing (NLP), Arabic, part of speech, tagging, prepositions, syntactic category, Holy Quran data set en_US
dc.title Arabic Part of Speech Tagging by Using the Stanford System: Prepositions as a Case Study en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account