Proteomics Applications
The molecular functions of a protein can be inferred from either its sequence or structure information. Sequence-based function inference methods annotate molecular function of a protein from its sequence homologues. Most genome-wide functional annotations are carried by using sequence alignment tools such as BLAST, or motif/profile-based search tools (e.g. PROSITE, PFAM, etc.). Protein domain patterns are assuming high importance in the analysis of the macromolecular functionality mainly when correlated with the relative gene function. Typically these kinds of studies are implemented by confronting a particular set of input sequences with databases of profiles derived from the analysis of specific set of proteins.
Many studies were focused on the detection of remote homologues. In general, methods using statistical models extracted from multiply aligned sequences perform better than pairwise sequence comparison methods. However, even these improved methods fail to recognize remote homologues with sequence identity <25-30%, which is estimated to be >25% of all sequenced proteins. Although from the sequences alignment it is possible to find much information about similarities of big core secondary structures, they are weakly related to the protein functionalities and to the distribution of active sites.
In this project we plan to port a number of these methods for the similarity search (Blast) and for the functional domain search based applications. A method for the surface proteins correlation will be also be specifically developed for EGEE GRID platform. This method starting from the 3D atomic coordinates of a protein, as retrieved from the Protein Data Bank (PDB), models the macromolecular surface. Because of the large amount of data and computer time to elaborate the protein surface this algorithm will be implemented on a GRID to improve its performance.