Abstract:
Feature selection is one of a key success factor for classification problem in high dimensional datasets. This process aims to select the discriminative subsets of features in order to enhance the classification performance and reduce learning time.
In this thesis we introduce an approach for handling the classification problem in high dimensional datasets using scatter search algorithm with wrapper model. During our study we have implemented the sequential and the parallel versions of the scatter search algorithm.
The classification performance for the two versions are similar for Mushroom, Madelon, Gisette and Spambase datasets. For Arcene dataset the parallel version of the scatter search enhances the classification performance from 0.93 to 0.94 comparing with the sequential version of the scatter search.
Five benchmark datasets are used to evaluate our approach, all of them are two-class classification problem. They are: Mushroom, Spambase, Arcene, Madelon and Gisette. Three of them (Arcene, Madelon and Gisette) are feature selection challenge.
The obtained results indicate that the proposed approach is very efficient for feature selection process in high dimensional datasets; since the scatter search algorithm reduces the execution time and enhances the classification performance.
A comparative study is conducted with other research in the literature that uses the evolutionary algorithms to handle the classification problem in high dimensional datasets. Our proposed method is very efficient, it reduced the execution time for all the datasets that we use in our experiments, and enhanced the classification results, the classification results are ranged from 0.92 to 1.0 for the five datasets.
Description:
CD,no of pages 60,30130, informatics 2/2017