Investigating environment-specific signatures of marine pelagic microbes : Insights from comparative genomics
- Pelagic microbes have adapted to a dynamic and challenging natural environment. Genomes of microbes from a wide range of environments are now available, creating new opportunities to investigate environment-specific genomic content. However, the lack of reliable environmental contextualisation severely limits comparative analysis. This thesis summarises efforts to enable,implement, and perpetuate large-scale comparative microbial genomics in order to detect environment-specific genomic signatures of marine pelagic microbes. Development and application of the Environment Ontology provided a controlled classification of genomes by environment type. An exploratory survey using this classification detected genomic features with relevance to the demands of the marine water column. Features of unknown function and those with regulatory function featured heavily among marine-specific genomic signatures. To investigate these, this thesis includes analyses of the Global Ocean Sampling metagenomes. These analyses provided ecogenomic perspectives on genomic features of unknown function and related transcriptional regulator abundance to the environmental stability of the water column. Their findings support the connection of these features to the niche-specific adaptations of pelagic microbes. Lastly, this thesis describes efforts to ensure steady growth of contextual data alongside genomic data to support future environment-enabled analyses. Community efforts in establishing contextual data standards are represented by the Minimum Information about a Marker gene Sequence (MIMARKS) and any Sequence (MIxS) checklist projects. The MetaBar and CDinFusion software tools, which promote contextual data acquisition and submission, were also developed in support of these efforts and are described. This thesis concludes with the description of the architecture and capability of Megx.net, which provided the framework for the integration and dissemination of many of this project’s outcoms.