Data Integration for Marine Ecological Genomics
- Many aspects of life can only be explained in their ecological context. This was recognized more than a hundred years ago by Haeckel: "Ecology is the entire science of the relations of an organism to its environment to which we can count in a broader sense all conditions of existence". Molecular methods developed over the last 20 years have revealed the diversity and functioning of microbial communities and their crucial role in ecosystem functioning.
Metagenomics, defined as "the functional and sequence-based analysis of the collective microbial genomes contained in an environmental sample" (Riesenfeld et al. 2004), allows researchers to perform cultivation-independent studies of the microbial world on the DNA level. Basic questions like "How does the environment influence the gene content?", and "How does the functional potential encoded therein influence the capacity of a microbial community to interact with the environment?" can be addressed now. Metagenomics can be used to test the hypothesis that genes with no known function are conserved in certain microbial communities and may be important for their ecological adaptation and survival. However, systematic management of data is a crucial prerequisite to achieve a holistic picture of the complex interactions in the microbial realm, on basic ecological questions like "Who is out there?" and "What are they doing?" for relating sequence data to ecological data.
The results of this thesis are genomic data standardization, software architecture development and implementation of an integrated framework for ecological genomics. The centerpiece is the integrated Microbial Ecological Genomics Database (MegDb). Tools have been developed using geo-referenced DNA sequence data of MegDb. This proofs MegDb suitable for ecological genomics based on standards such as the "Minimum Information about a Genome Sequence" recommendation by the Genomic Standards Consortium.