Using clustering to enhance protin sequence classification

dc.contributor.advisorkhader, Sameer
dc.contributor.authorAltartouri, Haneen
dc.date.accessioned2022-04-07T10:26:37Z
dc.date.accessioned2022-05-11T05:33:03Z
dc.date.available2022-04-07T10:26:37Z
dc.date.available2022-05-11T05:33:03Z
dc.date.issued5/1/2013
dc.descriptionno of pages 109, 26547, Informatics 2/2013 , in the store
dc.description.abstractWe introduce a new approach for enhancing the performance of prediction of biological attributes based on protein sequences using a combination of classification algorithms and clustering analysis. Before applying classification, we use clustering analysis in order to find clusters of similar proteins. A classification algorithm is then applied on each cluster. The proposed approach is suitable for large datasets, when high classification accuracy and fast convergence are required. Different descriptors based on the physicochemical properties of amino acids are used, some of them are native properties and the others are derived properties. Two encoding methods are used to represent the protein sequences using the descriptors. These descriptors and encoding methods are analyzed to enhance the performance of the proposed approach. Three standard benchmark datasets, Caspase, Major Histocompatibility Complex class II (MHC-II) and the membrane proteins are used to examine the proposed approach. Many experiments with different parameters are performed and the results are cross validated. The results show that applying clustering prior to classification gives higher prediction accuracy than using the classification without clustering, especially when using the membrane proteins dataset and the Caspase dataset. In addition, the result of time performance, especially when using the MHC-II viien_US
dc.identifier.urihttp://test.ppu.edu/handle/123456789/3081
dc.language.isoenen_US
dc.publisherجامعة بوليتكنك فلسطين - informaticsen_US
dc.subjectenhance protein sequenceen_US
dc.titleUsing clustering to enhance protin sequence classificationen_US
dc.typeOtheren_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Using Clustering to Enhance Protein.pdf
Size:
59.01 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: