Hashes to identify input, outputs and output_annotations data entries
Opened this issue · 0 comments
oplatek commented
Our dataset management can be illustrated based on the dependencies how the entries are generated.
input(dataset, split) -> NLG process -> output(NLG_system_id) \
output(NLG_system_id) -> ANNOTATION_PROCESS -> annotations_of_output(campaign_details, ...)
Since many properties could identify input, output, and output_annotations
, I think it is best to use hashes to identify inputs, outputs, and list_of_example_annotations.
I image that each data entry will have a hash
input
- input_idx # determining dataset, split and particular example, how to example was preprocess/rendered by factgenie etc...
output
- input_idx # reference to the exact input which was used for generation
- output_idx # uniquely identifying the output
annotations_list
- output_idx # uniquely identifying which output was annotated
- annotations_idx # uniquely identifyiing the annotation list