Masters Thesis

Comparative Analysis of Haptophyte Proteins using Similarity Networks integrated with Gene Expression Profiles and Annotations

The main objective of this study was to comparatively analyze the similarity relationships between protein sequences of four Haptophyte species E.huxleyi, G.oceanica, I.galbana andChrysochomulina using similarity networks. Protein similarity networks from total of 118,605 protein sequences were generated based on pairwise BLASTP results with varied criteria. Selected similarity networks are then further analyzed by integrating gene expression and annotations datain order to identify clusters of proteins that demonstrate differential gene expressions and may be functionally related to interesting biological processes, e.g. biomineralization.First all proteins in four species were clustered using SiLiX program into 13,337 similarity networks, among which 3,798 networks consisted of proteins only in E.huxleyi and G.oceanica,two calcifying species. Analysis of E.huxleyi-G.oceanica only networks found majority of the network sequences tend to be up regulated under spike condition. Then similarity networks ofspecific families of proteins were generated and analyzed, for example the Carbonic anhydrase protein family that were likely involved in the biomineralization process. The known Carbonicanhydrases in E.huxleyi and the similar sequences in other species, total of 149 similar protein sequences, were clustered into four groups of alpha, beta, gamma and delta as expected. To further investigate more detailed structures within a network, UPGMA trees were constructed to divide itinto sub-clusters according to multi-sequence alignment of network sequences. As the result, sub clusters consisting of only E.huxleyi and G.oceanica proteins were extracted from beta and delta networks, predicting probable involvement of these sequences in calcification process. Programsdeveloped in this research can be applied to analyze any family or group of proteins. Likewise, similarity networks were generated and analyzed for Elongase family and proteins related to Lipid metabolism, which allowed to derive functionalities of unknown protein sequences in the clusters

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.