You want to build AI/ML but want to reduce your legal risk? You'd like to show rightsholders and regulators that you're serious about data dilligence?
Enter document-training-data
. It's:
- A tool to create a a detailed summary of training data.
- A tool to provide forensic evidence at industry standard.
- A tool for simple and effective regulatory compliance.
- A tool to generate manifests to be cryptographically signed.
This script(s) can be rewritten in a matter of hours by any competent programmer, for example using the ISCC library (International Standard for Content Codes). Feel free to take the code here and integrate it into your own MLops pipelines.