The impact of pre-clustering on classification of heterogeneous protein data

dc.contributor.authorAltartouri, Haneen
dc.contributor.authorTamimi, Hashem
dc.contributor.authorAshhab, Yaqoub
dc.date.accessioned2021-12-13T11:19:13Z
dc.date.accessioned2022-05-22T08:55:50Z
dc.date.available2021-12-13T11:19:13Z
dc.date.available2022-05-22T08:55:50Z
dc.date.issued2021-09-14
dc.description.abstractThe aim of this paper is to evaluate improvement in the classification of protein sequence data by introducing clustering as a prepossessing step. Clustering analysis was introduced to discover any possible sub-clusters that might have different patterns within the same protein class. A classification learning algorithm is then applied to each cluster to enhance the classification accuracy. Two standard benchmark datasets: caspase 3 human substrates that include cleaved and non-cleaved peptides, and the membrane proteins inner and α-helical proteins were used to examine the proposed approach. Different descriptors based on the physicochemical properties of amino acids were extracted from the protein sequence data and two encoding methods were used to represent the protein sequences using the descriptors. The results show that applying clustering process prior to classification gives higher prediction accuracy than using classification alone. In addition, the result of time performance shows that the proposed approach succeeded in reducing the training time of the classification process significantly while maintaining the accuracy of prediction.en_US
dc.identifier.urihttp://localhost:8080/xmlui/handle/123456789/8406
dc.language.isoenen_US
dc.publisherNetwork Modeling Analysis in Health Informatics and Bioinformatics - Springeren_US
dc.subjectProtein sequence dataen_US
dc.subjectClassificationen_US
dc.subjectClusteringen_US
dc.subjectPhysico-chemical propertiesen_US
dc.titleThe impact of pre-clustering on classification of heterogeneous protein dataen_US
dc.title.alternativeThe impact of pre-clustering on classification of heterogeneous protein dataen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
manuscript.pdf
Size:
585.5 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: