qiime2/provenance-lib

Replay produces a methods manifest

Opened this issue · 2 comments

Methods-section writing in computational biology papers requires authors to find a balance between completeness and simplicity. Few readers are likely to care about the details of every method in a large-scale project, but a reader attempting to reproduce a study benefits significantly from a plaintext description of every step. When researchers don't write about every computational method, they are also less likely to provide attribution to low-level or non-terminal methods, creating disparities in citation counts that may not reflect actual patterns of use. (many publications prefer no citations of work not referenced in the text.)

A methods manifest solves both of these problems by providing brief descriptions of QIIME 2 actions, registered via the plugin registration API, alongside the name of the plugin and action that were used, and reference keys that map to the keys in an output bibliography. Depending on the complexity of the tooling required, these could be simple numerical keys managed in Python, or this report and the reference list itself could be produced with LaTeX/BibTeX. Optionally, each action may be associated with data on the computational environment in which it was performed, or a reference key to that information.

By including the methods manifest for publication as an appendix, authors can defer comprehensive methods descriptions to the manifest as needed, while still providing complete attribution to the authors of the methods applied during the analysis.

I really like this idea. We have been discussing something like this since the early days, but calling this a "manifest" and encouraging publication as a supplement is a really a good step forward, and a more realistic goal than "QIIME, write my methods section for me". I also like that you explicitly link this to citation issues (limited by journal restrictions).

I do not think that provenance replay can do all of this, though. Just as the onus of citation registration and writing clear plugin/action descriptions is on the individual plugins, so too should or could be method text registration. Provenance replay would then just need to parse these, dereplicate, and order them together. This sounds like maybe what you have in mind, but not totally clear so just thought I would interject 😁

Exactly what I had in mind, @nbokulich - leaning on registration feels like a technically straightforward approach to producing a solid result and handling the attribution issue, and leaves responsibility for documenting methods with their developers without costing them much.

For completeness, @gregcaporaso and I have also kicked around the possibility of having a natural-language whiz attempt to auto-generate something more impressive. It would be neat if it came together, but I don't have the expertise to comment on how feasible that is, and am not 100% clear on who the target audience would be.

IMO, there are basically two audiences to serve here - the common journal article reader who needs a concise summary of key methods, and the reader who needs a comprehensive description to support study reproducibility or extension. I see this manifest targeting the second reader, while the more complex task of deciding what key methods the common reader will care about is left up to the publishing author.