/software-vs-data

understanding and documenting the differences between software and data in the context of citation

Creative Commons Attribution 4.0 InternationalCC-BY-4.0

#Software vs Data

Author/Editors: Daniel S. Katz, Kyle E. Niemeyer, Arfon M. Smith

Additional Authors: Carl Boettiger,

This repository is intended to be used to discuss and document the differences between software and data in the context of citation.

It has been created in the process of the FORCE11 Software Citation Working GroupFORCE11 Software Citation Working Group writing the FORCE11 Software Citation PrinciplesSmith et al. 2016a, and then the editors submitting them to PeerJ Computer ScienceSmith et al. 2016b and responding to reviewer comments.

We start with the idea that software, while similar to data in terms of not traditionally having been cited in publications, is also different than data. Software can be considered a type of data, but the converse is not generally true.

The remainder of this document gives examples of these differences.

If you want to add a new difference, please do via a pull request. Similarly, if you want to add a citation or add a new explanation, please also do this via a pull request. If you want to discuss a difference (for example, you don't think it's correct), please open a new issue or discuss via an existing issue. If you do add text in a pull request, also add yourself as an additional author in that same request, following the existing format and keeping the additional author list in alphabetic order by surname. (And add a comma after all authors but the last one.)


##Format of this document:

###Statement

Explanation if needed

Evidence: Citations


##List of Differences

###Software is a creative work, data a fact.

In particular, software is generally subject to copyright protection as a creative work, while data is frequently considered outside the domain of copyright as it is comprised of facts about the world (you cannot copyright the height of Mt. Everest.) Major scientific data repositories (e.g. Dryad, FigShare) automatically apply licenses suited to data that may not be suited to software.

Evidence: Can I apply a Creative Commons license to software?Creative Commons; Non-software licensesChoose a License

###Data provides evidence, software provides a tool.

Software exists to perform a task, data does not. Data is fundamentally an emperical observation about the world, while software is fundamentally a logical construct. These differences have important consequences for how each may be re-used in future: software may be used by any researchers seeking to apply the same method, data by any researchers seeking evidence about the same fact about the world.

Evidence: Citations?

###Software can be used to express or explain concepts, unlike data.

Explanation?

Evidence: Citations?

###Software is updated more frequently than papers or data.

Datasets can of course be frequently updated, but each update likely represents the findings of a new or repeated study.

More explanation?

Evidence: Citations?

###Software is executable, unlike papers or data.

Explanation?

Evidence: Citations?

###Software suffers from a different type of bit rot than data.

Software must be constantly maintained so that it continues to function as both the hardware and software environments that it is used on changes, as developers find and fix bugs, and as user requirements demand new features and capabilities. On the other hand, bit rot for data is generally thought of as the underlying hardware that holds the bits changing, or software that can interpret the data also needing to be updated. These types of bit rot also affect software—the software is actually stored as a set of bits, and these bits must be interpretted, often as ASCII or UNICODE characters—but software bit rot is generally thought of as a concern of a higher level on top of this.

More explanation?

Evidence: Citations?

###Software is frequently built to use other software, leading to complex dependencies, and these dependent software packages also frequently change.

Explanation?

Evidence: Citations?

###Software is generally smaller than data, so a number of the storage and preservation constraints on data don’t apply to software.

Explanation?

Evidence: Citations?

###Software teams can be large and multidisciplinary, and with varied roles that overlap and sometimes make contributions ambiguous—some of which can rise to a level equivalent to paper authorship.

Explanation?

Evidence: Citations?

###The lifetime of software is generally not as long as that of data.

Explanation?

Evidence: Citations?

###Additional Differences?

##References

[Choose a License] Choose an open source license, "Non-software licenses," http://choosealicense.com/non-software/ Accessed: 2016-08-16.

[Creative Commons] Creative Commons, FAQ, "Can I apply a Creative Commons license to software?", https://wiki.creativecommons.org/index.php/Frequently_Asked_Questions#Can_I_apply_a_Creative_Commons_license_to_software.3F Accessed: 2016-08-16.

[FORCE11 Software Citation Working Group] FORCE11 Software Citation Working Group, GitHub repository, https://github.com/force11/force11-scwg. Accessed: 2016-07-10.

[Smith et al. 2016a] A. M. Smith, D. S. Katz, K. E. Niemeyer, and FORCE11 Software Citation Working Group “Software Citation Principles,” FORCE2016 Website, https://www.force11.org/software-citation-principles, 2016. Accessed: 2016-07-10.

[Smith et al. 2016b] A. M. Smith, D. S. Katz, K. E. Niemeyer, and FORCE11 Software Citation Working Group, “Software Citation Principles,” PeerJ Preprints 4:e2169v3, 2016. https://doi.org/10.7287/peerj.preprints.2169v3