Abstract:
Since the colloquial Arabic is now widespread it is required to describe the collection and classification of a multi-dialectal corpus of Arabic. Nowadays, colloquial multi-dialectal comes in almost country based forms such as Egyptian, Iraqi, Levantine, Tunisian, etc. This paper discusses a new method for analyzing the conversation of the educational chat room using Corpus for Palestinian Arabic and Stanford Tagger tool. This method represents the key words using semantic net-like representation to obtain the main subjects of the conversation. The main subject of the chat is obtained using the proposed method which achieves a high accuracy. Using Arabic Corpus, Stanford Tagger and percentage of keywords will assure more accuracy. The study also examines the effect of pivot-words distribution based on occurrences and betweeness values of the pivots throughout the text. This study examines some of the characteristics of the texts written in colloquial Arabic dialect and analysis of the free expressive Arabic statements. The results show that the core subject of the chat can be determined by combining both the occurrences and the distribution of the word through the conversation.