Abstract:
Different machine learning classifiers work well only when the training data is
balanced. That is, when the number of positive examples is similar to the number
of negative examples. If the training data is not balanced, the performance of the
classifiers will degrade.
This thesis presents an approach for handling the two-class classification problem
in classifying imbalanced datasets using weighted support vector machine
(WSVM) with particle swarm optimization algorithm (PSO). This study implemented
other algorithms such as Random Forest (RF), support vector machine
(SVM) and WSVM to compare their results with our implemented enhanced
weighted SVM (EWSVM) results. Five benchmarks datasets, with different imbalance
ratios, were used to evaluate our approach. The classification performance
was evaluated using four measurements which are; Area under curve (AUC), Sensitivity,
Specificity and Accuracy.
This approach was compared with other implemented algorithms; Random Forest,
SVM and weighted SVM and enhanced the classification performance for Ecoli4
and Poker datasets in terms of AUC.
Our results was compared with other researches in the literature that uses SVM,
weighted SVM, SVM with PSO and ensemble AdaBoost. When classifying Pima
dataset, it has a better performance of 0.71 than using SVM classifier without
weights with 0.6768 in terms of AUC and better than 0.704 that uses Adaboost
and a result of 0.95 closed to 0.96 that uses Adaboost when classifying Ecoli4
dataset and good results for the other datasets in terms of sensitivity, specificity
and accuracy.
Description:
CD, no of pages 46, 31050 , 4/2019 informatics