Exploring the Marine Virosphere: From Genome Context to Content
- Over the last decade, we have witnessed the dawn of marine phage genomics. Since the first genome was sequenced just over ten years ago, nearly thirty additional marine phage genomes and eleven marine virus metagenomes have offered a small glimpse into the genomic underpinnings of phage in the oceans. Genomics has revealed their role in host metabolism, as phage genomes harbor environmentally relevant functional genes, such as those involved in photosynthesis, nutrient stress response, nucleotide scavenging, and vitamin biosynthesis, some of which are expressed during infection. Furthermore, due to evidence of phage activity in host genomic islands, phages are now thought to be important drivers of microbial niche adaptation and diversification in the oceans. In recent years, advances in sequencing technologies and falling costs have led to sequence generation at an unprecedented rate. However, with this input, our ability to analyze and interpret new sequence data in attempt to garner biological insights (as those described above) is under serious threat. The role of bioinformatics in the current age is to keep this data accessible, or we risk serious loss in its value. For this reason the development of contextual data standards becomes crucial. The power of context to a biologist touches upon the core tenets of comparative genomics by expanding the dimensions among which comparisons and inferences can be made. This thesis addresses bioinformatic themes of contextual data development and implementation as a means to enhance marine phage genomics. This puts into practice the ultimate role of bioinformatics: to facilitate the capture and collection of various data sources, enabling in silico biological predictions that lead to acute, solvable laboratory experiments.