Document Type
Article
Publication Date
9-2022
Publication Title
Methods in Ecology and Evolution
Publication Title
Methods in Ecology and Evolution
Volume
13
Issue
11
Abstract
Accurate field data are essential to understanding ecological systems and forecasting their responses to global change. Yet, data collection errors are common, and data analysis often lags far enough behind its collection that many errors can no longer be corrected, nor can anomalous observations be revisited. Needed is a system in which data quality assurance and control (QA/QC), along with the production of basic data summaries, can be automated immediately following data collection.
Here, we implement and test a system to satisfy these needs. For two annual tree mortality censuses and a dendrometer band survey at two forest research sites, we used GitHub Actions continuous integration (CI) to automate data QA/QC and run routine data wrangling scripts to produce cleaned datasets ready for analysis.
This system automation had numerous benefits, including (1) the production of near real-time information on data collection status and errors requiring correction, resulting in final datasets free of detectable errors, (2) an apparent learning effect among field technicians, wherein original error rates in field data collection declined significantly following implementation of the system, and (3) an assurance of computational reproducibility—that is, robustness of the system to changes in code, data and software.
By implementing CI, researchers can ensure that datasets are free of any errors for which a test can be coded. The result is dramatically improved data quality, increased skill among field technicians, and reduced need for expert oversight. Furthermore, we view CI implementation as a first step towards a data collection and analysis pipeline that is also more responsive to rapidly changing ecological dynamics, making it better suited to study ecological systems in the current era of rapid environmental change.
First Page
2572
Last Page
2585
Recommended Citation
Kim, Albert Y.; Herrmann, Valentine; Barreto, Ross; Calkins, Brianna; Gonzalez-Akre, Erika; Johnson, Daniel J.; Jordan, Jennifer A.; Magee, Lukas; McGregor, Ian R.; Montero, Nicolle; Novak, Karl; Rogers, Teagan; Shue, Jessica; and Anderson-Teixeira, Kristina J., "Implementing GitHub Actions Continuous Integration to Reduce Error Rates in Ecological Data Collection" (2022). Statistical and Data Sciences: Faculty Publications, Smith College, Northampton, MA.
https://scholarworks.smith.edu/sds_facpubs/51
Digital Object Identifier (DOI)
10.1111/2041-210X.13982
Rights
© 2022 The Authors.
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Included in
Data Science Commons, Other Computer Sciences Commons, Statistics and Probability Commons
Comments
Archived as published.