Browsing by Author "Has, Canan"

Now showing 1 - 2 of 2

Enhancement and validation of current human genome annotation via novel proteogenomics algorithms
(Izmir Institute of Technology, 2016-12) Has, Canan; Allmer, Jens
Proteogenomics includes the transfer of knowledge from proteomics to genomics and vice versa. To have high confidence in the information transferred it is essential that it be based on experimental results. Genomics is currently fueled by high throughput techniques involving next generation sequencing. Proteomics is based on mass spectrometry (MS) which is also a high throughput approach. Both fields are generating a wealth of data which needs to be correlated and annotated to generate knowledge. Publicly available human blood plasma mass spectrometric data exist for samples in data repositories such as PeptideAtlas, PRIDE. We acquired high-quality collections from this data and stored it in a custom database developed by us. First, we aimed to amend this data by employing a proteogenomic pipeline PGMiner developed in this study against a custom sequence database which includes all predicted alternative open reading frames as well as the six-frame translation of the human genome and exosome. Then, we correlated the existing annotations with the available mass spectrometric measurements. The human genome in tandem with currently available genome annotations from HAVANA and ENSEMBL enabled us to validate and enhance current gene annotations.
Evaluation of protein secondary structure prediction algorithms on a new advanced benchmark dataset
(Izmir Institute of Technology, 2011) Has, Canan; Allmer, Jens
Starting from 1970s, researchers have been studying secondary structure prediction. However the accuracy of state-of art methods reach to approximately 80- 85%. One of the reasons for that is related with the limitations in respect to datasets used for training or testing the algorithm. A number of databases with n number of experimentally determined proteins, which also contain the knowledge of functionality, biochemical properties and location annotation of proteins, will directly show us how the algorithms work on certain groups of proteins. This also ensures opportunity to users to determine the quality of algorithms on those datasets and to decide on which algorithm can be used for which type of proteins. In this thesis, the objective is set through the development of a new and advanced protein benchmark database which contains functional and biochemical information of experimentally defined 64872 proteins in S2C database derived by ProteinDataBank (PDB). With this database, the seven available predictors are evaluated in respect to their performances on different datasets in terms of functionality and subcellular localization of proteins in the benchmark database. According to the results obtained on proposed benchmark datasets in compare to results on one of existing dataset, RS126, it was shown that grouping proteins into functions in their subcellular localizations have a great impact on deciding the accuracies of existing algorithms.