/ADC-citation-project

Arctic Data Center dataset citation analysis. Data science fellowship project

Primary LanguageRApache License 2.0Apache-2.0

Searching for Elusive Arctic Dataset Citations

2022 Data Science Fellowship Project
Althea N. Marks

NSF Arctic Data Center
National Center of Ecological Analysis and Synthesis
University of California Santa Barbara

Goals

While developing the next version of the package we seek to understand:

  • if our metrics on data citation are complete
  • how they vary across citation sources, disciplines, and usage patterns.
  • the amount of citation overlap among databases

Our findings will guide future efforts to record accurate dataset metrics at the Arctic Data Center and other repositories interested in tracking published dataset citations.

Analysis Webpage (ongoing project)

https://theamarks.github.io/ADC-citation-project/

Background

Academic researchers increasingly recognize the importance of publishing datasets in conjunction with peer-reviewed articles. Published datasets can be cited for many reasons, including to increase transparency and reproducibility of the research, to publicize that data are available for reuse by other researchers, and to provide credit to the authors when data are reused. However, the practice of citing datasets is rare in many disciplines and often does not follow citation guidelines, e.g., as recommended by the Force11 Data Citation Principles.

Dataset citations are tracked and recorded in several ways at the Arctic Data Center for the 6,700+ datasets curated and archived in the repository. The Arctic Data Center’s current citation tracking methods include the DataCite/CrossRef EventData service, manual entry by repository users, and the R package “scythe” developed by the Arctic Data Center. Scythe (v0.9.1) queries the Scopus, PLOS, Springer, and xDD databases for DOIs (Digital Object Identifier) provided by the user.