DSpace Repository

Indexing for Large DNA Database Sequences

Show simple item record

dc.contributor.advisor SAHEB, MAHMOUD
dc.contributor.author Wohosh, Samer
dc.contributor.author SAHEB, MAHMOUD
dc.date.accessioned 2018-02-14T10:03:02Z
dc.date.accessioned 2022-05-22T08:28:46Z
dc.date.available 2018-02-14T10:03:02Z
dc.date.available 2022-05-22T08:28:46Z
dc.date.issued 2011
dc.identifier.citation www.cscjournals.org/csc/manuscript/Journals/IJBB/volume5/Issue4/IJBB-125.pdf en_US
dc.identifier.issn 1985-2347,
dc.identifier.uri http://localhost:8080/xmlui/handle/123456789/7968
dc.description.abstract Bioinformatics data consists of a huge amount of information due to the large number of sequences, the very high sequences lengths and the daily new additions. This data need to be efficiently accessed for many needs. What makes one DNA data item distinct from another is its DNA sequence. DNA sequence consists of a combination of four characters which are A, C, G, T and have different lengths. Use a suitable representation of DNA sequences, and a suitable index structure to hold this representation at main memory will lead to having efficient processing by accessing the DNA sequences through indexing and will reduce the number of disks I/O accesses. I/O operations needed at the end, to avoid false hits, we reduce the number of candidate DNA sequences that need to be checked by pruning, so no need to search the whole database. We need to have a suitable index for searching DNA sequences efficiently, with suitable index size and searching time. The suitable selection of related fields, where index is built upon has a big effect on index size and search time. Our experiments use the n-gram wavelet transformation upon one field and multi-fields index structure under the relational DBMS environment. Results show the need to consider index size and search time while using indexing carefully. Increasing window size decreases the amount of I/O reference. The use of a single field and multiple fields indexing is highly affected by window size value. Increasing window size value leads to better searching time with special type index using single filed indexing. While the search time is almost good and the same with most index types when using multiple field indexing. Storage space needed for RDMS indexing types are almost the same or greater than the actual data. en_US
dc.language.iso en en_US
dc.publisher IJBB en_US
dc.relation.ispartofseries V5, n4;
dc.subject Large Database en_US
dc.subject DNA Sequence en_US
dc.subject Index Structure en_US
dc.subject Sequence Transformation en_US
dc.subject Wavelet Transformation en_US
dc.subject RDMS Indexing en_US
dc.subject Research Subject Categories::TECHNOLOGY en_US
dc.title Indexing for Large DNA Database Sequences en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account