dc.description.abstract |
Sentiment Analysis can be achieved using lexicons and machine learning methods to identify
the sentiment of a content and opinion mining of a text. With the large amount of news
being generated nowadays through various news websites, it is possible to apply text mining
techniques with the purpose of extracting general sentiment of particular news. The emphasis
in this case is on using a Sentiment Analysis application to extract sentiment from news
headlines. In this thesis, we have proposed a customized model for sentiment evaluation to
measure tensions level (using negative and positive scores) for every day on Middle East
news headlines in the Arabic media. The data are collected from Arabic media websites like
Aljazeera, then the required pre-processing steps are applied. Steps such as stop words and
punctuation marks removal, in order to get a pure dataset as an input for the regression
learning model. In this thesis, we have used the news headlines with their dates collected over
many years ago from five different sites. Also, we have devised a method for collecting the
news headlines automatically with their dates, category and description from RSS feed for
news websites to use them for future works. In addition, the data were processed and revised
by several important tools in Python. Moreover, We have used Google Cloud Translation
API in an innovative way to translate headlines automatically. Then, we devised a method for
headline labeling to give a score for each one, the Decibel formula is used as a quality measure
(sentiment score) for every headline, based on two main lixicons, namely WordNet and
SentiWordNet. We have trained a multiple linear regression model based on two important
entries for every day, the sum of positive scores and the sum of negative scores on that day,
so that the model will measure the sentiment score for that particular day. We have tested
the model with important measures, we have obtained 0.937 Explained Variance Score, 0.94
R2 Score and 0.04 MSE through a full year training data. Finally, we have connected the
model with Database and Flask server to achieve real time measurement. |
en_US |