Database and Functional Genomics Applications
Of fundamental importance for the creation of an efficient Bioinformatics workflow is the effective availability of the biological databases. For this reason the EGEE environment will be tested in order to access and integrate data in a distributed way.
We aim to provide the possibility to mange the biological database, by using the GRID EGEE infrastructure. These database will be complemented by the other publicly available in Internet, by using web services where is possible or appropriate.
To cluster gene products by their functionality as an alternative to the normally used comparison by sequence similarity, the knowledge of the Gene Ontology (GO) and UniProt can be used. The GO terms and the associations to gene products are collected in a relational database, the GO database and more than 1.1million UniProt products are described by the GO. A comparison of one gene product against all 1.1million annotated UniProt products ends up in a very data intensive procedure. The objective will also focus on the need to improve this system for the whole range of annotated UniProt products and will develop a way of a higher degree of parallelization. Such a parallelized and distributed process then can be adapted to different data intensive tasks.