Automatic identification of evolutionary and sequence relationships in large scale protein data using computational and graph-theoretical analyses

Doğan, Tunca

Automatic identification of evolutionary and sequence relationships in large scale protein data using computational and graph-theoretical analyses

dc.contributor.advisor	Karaçalı, Bilge	en
dc.contributor.author	Doğan, Tunca
dc.date.accessioned	2023-11-16T12:13:06Z
dc.date.available	2023-11-16T12:13:06Z
dc.date.issued	2012	en
dc.description	Thesis (Doctoral)--İzmir Institute of Technology, Biotechnology and Bioengineering, İzmir, 2012	en
dc.description	Includes bibliographical references (leaves: 111-115)	en
dc.description	Text in English; Abstract: Turkish and English	en
dc.description	xii, 115 leaves	en
dc.description.abstract	In this study, computational methods are developed for the automatic identification of functional/evolutionary relationships between biomolecular sequences in large and diverse datasets. Different approaches were considered during the development and optimization of the methods. The first approach focused on the expression of gene and protein sequences in high dimensional vector spaces via non-linear embedding. This allowed statistical learning algorithms to be applied on the resulting embeddings in order to cluster and/or classify the sequences. The second approach revised the pairwise similarities between sequences following multiple sequence alignment in order to eliminate the unreliable connections due to remote homology and/or poor alignment. This is achieved by thresholding the pairwise connectivity map over 2 parameters: the inferred evolutionary distances and the number of gapless positions in each pairwise alignment. The resulting connectivity map was disjoint and consisted of clusters of similar proteins. The third and the final approach sought to associate the amino acid sequences with each other over highly conserved/shared sequence segments, as shared sequence segments imply conserved functional or structural attributes. An automated method was developed to identify these segments in large and diverse collections of amino acid sequences, using a combination of sequence alignment, residue conservation scoring and graph-theoretical approaches. The method produces a table of associations between the input sequences and the identified conserved regions that can reveal both new members to the known protein families and entirely new lines. The methods were applied to a dataset composed of 17793 human proteins sequences in order to obtain a global functional relation map. On this map, functional and evolutionary properties of human proteins could be found based on their relationships to the ones bearing functional annotations. The results revealed that conserved regions corresponded strongly to annotated structural domains. This suggests the method can also be useful in identifying novel domains on protein sequences.	en
dc.identifier.uri	http://standard-demo.gcris.com/handle/123456789/6262
dc.language.iso	en	en_US
dc.publisher	Izmir Institute of Technology	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject.lcsh	Bioinformatics	en
dc.subject.lcsh	Computational biology	en
dc.title	Automatic identification of evolutionary and sequence relationships in large scale protein data using computational and graph-theoretical analyses	en_US
dc.type	Doctoral Thesis	en_US
dspace.entity.type	Publication
gdc.description.department	Bioengineering	en_US
gdc.description.publicationcategory	Tez	en_US
gdc.oaire.accepatencedate	2012-01-01
gdc.oaire.diamondjournal	false
gdc.oaire.impulse	0
gdc.oaire.influence	2.9837197E-9
gdc.oaire.influencealt	0
gdc.oaire.isgreen	true
gdc.oaire.keywords	Biyomühendislik
gdc.oaire.keywords	Bioengineering
gdc.oaire.popularity	8.197724E-10
gdc.oaire.popularityalt	0.0
gdc.oaire.publicfunded	false

Collections

Doktora Tezleri

Automatic identification of evolutionary and sequence relationships in large scale protein data using computational and graph-theoretical analyses

Files

Collections