Beyond vector space model for hierarchical Arabic text classification: A Markov chain approach

dc.contributor.authorAl-Anzi, Fawaz
dc.contributor.authorAbuZeina, Dia
dc.date.accessioned2021-05-09T08:07:51Z
dc.date.accessioned2022-05-22T08:54:19Z
dc.date.available2021-05-09T08:07:51Z
dc.date.available2022-05-22T08:54:19Z
dc.date.issued2017-10-11
dc.description.abstractThe vector space model (VSM) is a textual representation method that is widely used in documents classification. However, it remains to be a space-challenging problem. One attempt to alleviate the space problem is by using dimensionality reduction techniques, however, such techniques have deficiencies such as losing some important information. In this paper, we propose a novel text classification method that neither uses VSM nor dimensionality reduction techniques. The proposed method is a space efficient method that utilizes the first order Markov model for hierarchical Arabic text classification. For each category and sub-category, a Markov chain model is prepared based on the neighboring characters sequences. The prepared models are then used for scoring documents for classification purposes. For evaluation, we used a hierarchical Arabic text data collection that contains 11,191 documents that belong to eight topics distributed into 3-levels. The experimental results show that the Markov chains based method significantly outperforms the baseline system that employs the latent semantic indexing (LSI) method. That is, the proposed method enhances the F1-measure by 3.47%. The novelty of this work lies on the idea of decomposing words into sequences of characters, which found to be a promising approach in terms of space and accuracy. Based on our best knowledge, this is the first attempt to conduct research for hierarchical Arabic text classification with such relatively large data collection.en_US
dc.identifier.urihttp://localhost:8080/xmlui/handle/123456789/8258
dc.language.isoen_USen_US
dc.publisherElsevieren_US
dc.subjectArabic text Classification Vector space model Markov chain Hierarchyen_US
dc.titleBeyond vector space model for hierarchical Arabic text classification: A Markov chain approachen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
4_Paper4_Beyond vector space model for hierarchical Arabic text classification - A Markov chain approach.pdf
Size:
2.09 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: