Genome Database Indexing Using A Modified Wavelet Transformation And Btree

Wohoush, Samer

DSpace Home
→
Graduation Projects, Theses, and Student Papers
→
Master of Informatics
→
View Item

dc.contributor.advisor	Tahboub, kareem
dc.contributor.author	Wohoush, Samer
dc.date.accessioned	2022-04-11T06:10:09Z
dc.date.accessioned	2022-05-11T05:33:13Z
dc.date.available	2022-04-11T06:10:09Z
dc.date.available	2022-05-11T05:33:13Z
dc.date.issued	6/1/2011
dc.identifier.uri	http://test.ppu.edu/handle/123456789/3089
dc.description	no of pages 100, 25719, 26546 , informatics 3/2011 ,4/2011,
dc.description.abstract	The main problem of searching the Genome DNA sequences is the large size of sequences and the very high and variant sequences lengths. There are different methods used to enhance sequence searching like using database indexing methods instead of direct access to sequence files. Our main idea is to provide a suitable access methodology, in time and space, to Genome DNA sequences for searching and comparing while considering the size of the data and the index. The Genome database searching system is needed to give facilities, compact data representation and compression, accurate output, practical to use, and to minimize the number of l/O operations. l/O operations mainly needed at last step to avoid false positives (the sequences that appear to be related but are not related to the searched query). The number of candidate sequences, that need to be checked by database l/O referencing, will be reduced by pruning so no need to search the whole database. In this thesis, we propose an approach to build a complete index structure that is suitable for large database to do searching with efficient storage space and search time. We use a suitable representation of Genome DNA sequences using n-gram Haar wavelet transformation, and integer conversion for coefficients. A suitable index structure, which is build upon a modified BTree index, is used to hold the integer representation after transformation. We also introduce enhancements that can be followed to increase system efficiency by decreasing index storage size. Our structure is called the Modified Wavelet Transformation and BTree (M-WTBT). The M-WTBT structure allows tuning for a set of parameters so that the index structure is suitable to the available resources. An implementation is done, using a dataset used previously by a set of researches, to approve features and to show the advantages of the M-WTBT structure. Also, the M-WTBT shown to be effective when compare with a set of previous researches. Keyword: Sequence transformation, sequence compression, large database indexing, Haar Wavelet Transformation, Genome DNA Sequence searching and indexing	en_US
dc.language.iso	en	en_US
dc.publisher	جامعة بوليتكنك فلسطين - informatics	en_US
dc.subject	Genome	en_US
dc.subject	Btree	en_US
dc.subject	Wavelet	en_US
dc.title	Genome Database Indexing Using A Modified Wavelet Transformation And Btree	en_US
dc.type	Other	en_US