On the feasibility to engage heterogeneous communities in data gathering, sharing and enrichment
- Marine microbes play critical roles in the well being of the planet Earth and all its inhabitants. Not only do they influence chemical cycles, the marine food chain, but also the whole atmosphere and climate of our planet. However, the field of marine microbiology is still in its infancy and there is much more waiting to be explored. Here, I present a new approach to investigate global marine microbial diversity and function on a single day of the year, the 21st of June 2014/2015: the Ocean Sampling Day (OSD). The collection of a simultaneous, global dataset, required marine researchers, worldwide, to be connected. The aim was not only to create a snap shot of the marine microbial diversity fixed in time, but also to raise awareness amongst the general public of the important role these tiny organisms play in our daily lives. Therefore, professional scientists as well as the non-scientific public were invited to join the corresponding citizen science project, MyOSD. They supported OSD by providing oceanographic measurements and even microbial samples. Data collected by citizen scientists were validated and show that citizen science can contribute valuable data to marine research. A special focus was set on additional environmental measurements such as water temperature. This contextual data is important for the interpretation of microbial diversity in any given sample; however, it is still not common practice in marine microbial research to measure or report contextual data; OSD aims to make scientists more aware of this problem.
Extracting contextual data after a dataset or article has been published, is onerous work. Hence, I present two new tools to extract environmental information and geographic locations from scientific literature. The text mining tool, ENVIRONMENTS, automatically annotates scientific text with terms from the Environmental Ontology (EnvO). The PubMap application utilizes the power of the crowd to enable the creation of a manually curated database of georeferenced scientific publications.
Overall, this thesis shows that enabling collaboration within the scientific community as well as the non-scientific public, leads to achievements not only in gathering of new datasets, but also in enhancing present and historic scientific literature.