Building data analysis capacity
To date, METS data have primarily been used to address questions and support the scientific goals of individual METS and ancillary projects. An important challenge and opportunity to enhance the scientific impact of METS is to build capacity for data analysis and synthesis across METS and with complementary multi-platform observations to address spatial patterns of biogeochemical variability and change. This RCN’s development of a FAIR METS data model framework will be critical in supporting these efforts. To build capacity among emerging leaders in the oceanographic community and foster hands-on instruction, peer learning, and collaboration, the METS RCN will convene a METS data hackathon geared toward early career participants (student, postdocs, new faculty) that follows an Oceanhackweek format, which is described below:
“The hackweek model has emerged within the data science community as a powerful tool for fostering exchange of ideas in research and computation by providing training in modern data analysis workflows. In contrast to conventional academic conferences or workshops, hackweeks are intensive and interactive, facilitated by three core components: tutorials on state-of-the-art methodology, peer-learning, and on-site project work in a collaborative environment. This setup is particularly powerful for sciences that require not only domain-specific knowledge, but also effective computational workflows to foster rapid exchange of ideas and make discovery….”
While a great deal of funding is invested in the collection of METS and other oceanographic data, there is little explicit investment in the follow-on synthesis and data analysis work that is needed to realize the full scientific potential of these data. The oceanography community faces “big data” informatics and cyber-infrastructure challenges, including combining very large oceanographic datasets spanning a wide range of spatiotemporal scales, as well as data derived from multiple platforms and disciplines. The METS RCN will accelerate progress toward Harnessing the Data Revolution (NSF Big Idea) by building analysis tools and capacity to integrate diverse datasets. Lead PI Benway has been in contact with Oceanhackweek lead organizers to query alignment of METS RCN capacity building objectives with the Oceanhackweek model and has received a positive response (see letter of collaboration). Potential hackathon modules might include:
- Statistics and visualization – Combining data across METS to view temporal trends in a broader spatial context provides insight on marine ecosystem links to regional climate indices, local anthropogenic impacts, etc., which can inform prediction and decision making. For example, T. O’Brien (RCN Steering Committee) has adapted and applied statistical methods to analyze trends in METS variables from globally distributed time series stations against a backdrop of satellite data (O’Brien et al., 2017).
- METS data for modelers – It is important for time series data to be assimilated into numerical modeling frameworks that may elucidate cause and effect scenarios not easily perceived through simple statistical analyses. To facilitate increased use of time series data by the modeling community and improve communication and exchanges between observationalists and modelers, the use of shared repositories such as GitHub can stimulate the development of community-driven, open source code for extracting, quality-controlling, and gridding time series data. Simply creating shared post-processing scripts that can be tailored for each individual modelers’ needs will significantly decrease duplicative efforts, increase access to time series data, and improve validation of numerical models.
- Data integration - Monitoring ocean change requires a sustained, globally distributed network of observatories that integrates shipboard, autonomous, and remote sensing platforms, which is generating increasingly complex data streams. Many METS sites have actually incorporated autonomous assets, including gliders, profiling floats, and wirewalkers to name a few, and many ancillary project participants deploy complementary instrumentation on an ad hoc basis. The integration of data across these different platforms is an important area for building computational capacity.