Use Case: Record and Discover Derived Products
Opened this issue · 0 comments
Use Case: Discover Derived Products
Goals and Summary
Via data processing, analysis, modeling, and visualization processes, researchers create derived products, including derived data sets, figures, tables, animations, and other artifacts. By establishing citation relationships showing provenance relationships among these derived and source products, we can preserve the dependency relationships for use in reproducing the science, thereby enabling discovery of data and products from their relationships. For example, with appropriate relationships (prov:wasGeneratedBy, prov:used), one can determine if one product was derived from another, and following the graph of such linkages, could discover other analyses and products that were derived from the same source data sets.
Why is it important and to whom?
- To reproduce science, researchers need the ability to follow data derivation changes
- Because researchers tend to only cite the proximate data used in a study, these provenance relationships allow researchers to get credit for the impact of upstream source data in downstream synthetic analysis
- In a complex workflow, an error may be introduced in raw products that were used to create a derived product. Data source citations allow one to proceed from source to products, notifying appropriate researchers of the errors.
Why hasn’t it been solved yet?
- Provenance modeling languages have been in flux (e.g., PROV, OPM)
- Few tools support capture of provenance information in a standard format
- Data repositories usually lack provenance information, or it is in natural language format