Indexing for Large DNA Database Sequences

Wohosh, Samer; SAHEB, MAHMOUD

dc.contributor.advisor	SAHEB, MAHMOUD
dc.contributor.author	Wohosh, Samer
dc.contributor.author	SAHEB, MAHMOUD
dc.date.accessioned	2018-02-14T10:03:02Z
dc.date.accessioned	2022-05-22T08:28:46Z
dc.date.available	2018-02-14T10:03:02Z
dc.date.available	2022-05-22T08:28:46Z
dc.date.issued	2011
dc.identifier.citation	www.cscjournals.org/csc/manuscript/Journals/IJBB/volume5/Issue4/IJBB-125.pdf	en_US
dc.identifier.issn	1985-2347,
dc.identifier.uri	http://localhost:8080/xmlui/handle/123456789/7968
dc.description.abstract	Bioinformatics data consists of a huge amount of information due to the large number of sequences, the very high sequences lengths and the daily new additions. This data need to be efficiently accessed for many needs. What makes one DNA data item distinct from another is its DNA sequence. DNA sequence consists of a combination of four characters which are A, C, G, T and have different lengths. Use a suitable representation of DNA sequences, and a suitable index structure to hold this representation at main memory will lead to having efficient processing by accessing the DNA sequences through indexing and will reduce the number of disks I/O accesses. I/O operations needed at the end, to avoid false hits, we reduce the number of candidate DNA sequences that need to be checked by pruning, so no need to search the whole database. We need to have a suitable index for searching DNA sequences efficiently, with suitable index size and searching time. The suitable selection of related fields, where index is built upon has a big effect on index size and search time. Our experiments use the n-gram wavelet transformation upon one field and multi-fields index structure under the relational DBMS environment. Results show the need to consider index size and search time while using indexing carefully. Increasing window size decreases the amount of I/O reference. The use of a single field and multiple fields indexing is highly affected by window size value. Increasing window size value leads to better searching time with special type index using single filed indexing. While the search time is almost good and the same with most index types when using multiple field indexing. Storage space needed for RDMS indexing types are almost the same or greater than the actual data.	en_US
dc.language.iso	en	en_US
dc.publisher	IJBB	en_US
dc.relation.ispartofseries	V5, n4;
dc.subject	Large Database	en_US
dc.subject	DNA Sequence	en_US
dc.subject	Index Structure	en_US
dc.subject	Sequence Transformation	en_US
dc.subject	Wavelet Transformation	en_US
dc.subject	RDMS Indexing	en_US
dc.subject	Research Subject Categories::TECHNOLOGY	en_US
dc.title	Indexing for Large DNA Database Sequences	en_US
dc.type	Article	en_US