UTILIZING STANDARD DEVIATION IN TEXT CLASSIFICATION WEIGHTING SCHEMES

dc.contributor.authorAl-Anzi, Fawaz
dc.contributor.authorAbuZeina, Dia
dc.date.accessioned2021-05-09T08:08:53Z
dc.date.accessioned2022-05-22T08:55:32Z
dc.date.available2021-05-09T08:08:53Z
dc.date.available2022-05-22T08:55:32Z
dc.date.issued2017-08-04
dc.description.abstractThe term frequency { inverse document frequency (TF-IDF) weighting sch- eme is widely used in text classi cation for weighting the features of the vector space model (VSM). It aims at enhancing words' discriminating capabilities by weighing up the less frequently used words and, at the same time, weighing down the high frequency words (i.e., the common words such as prepositions). This paper attempts to provide an enhanced variant of the well-known TF-IDF method. The TF-IDF is a statistical estimation that computes the weight of each word based on the frequency of the word in both the document and the entire data collection. In this work, we propose considering the word's standard deviation as another factor when computing the word's weight. That is, the common words tend to have larger standard deviations more than the uncommon words. In other words, the more the word appears in documents, the greater the standard deviation is. To investigate the proposed TF-IDF based model, we conducted some experiments for Arabic text classi cation. We used a training textual data collection that contains 1,750 documents of ve categories (250 documents for testing). The experimental results show that the proposed approach is superior to the standard TF-IDF term weighting scheme. Keywords: Arabic, Text, Classification, TF-IDF, Singular value decomposition.en_US
dc.identifier.urihttp://localhost:8080/xmlui/handle/123456789/8360
dc.language.isoen_USen_US
dc.subjectArabic, Text, Classification, TF-IDF, Singular value decompositionen_US
dc.titleUTILIZING STANDARD DEVIATION IN TEXT CLASSIFICATION WEIGHTING SCHEMESen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
8_Paper8_UTILIZING STANDARD DEVIATION IN TEXT CLASSIFICATION.pdf
Size:
421.52 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: