Support PROV-Dictionary?
stain opened this issue · 0 comments
We've been wanting to use PROV-Dictionary extension with prov.py, but it's a bit tricky if we want to serialize in multiple formats.
Our current workaround is to register the regular membership of prov:Collection
as supported by prov.py, and also say it's a prov:Dictionary:
entity = document.entity("ex:someFile")
coll = document.entity("ex:someDirectory", [
(provM.PROV_TYPE, PROV["Collection"]),
(provM.PROV_TYPE, PROV["Dictionary"]),
])
Then regular membership is easy:
document.membership(coll, entity)
prov.py does however not have a dictionaryMembership
method. To express the PROV Dictionary we use a PROV-O compatible attributes:
# Membership relation
m_entity = document.entity(uuid.uuid4().urn, [
(PROV["KeyEntityPair"])
])
m_entity.add_attributes({
PROV["pairKey"]: entry["basename"],
PROV["pairEntity"]: entity,
})
This workaround produces PROV-O statements correct according to PROV-Dictionary section 5:
ex:someDirectory a
prov:Collection,
prov:Dictionary,
prov:Entity ;
prov:hadMember ex:someFile ;
prov:hadDictionaryMember <urn:uuid:25d8fc8b-2b63-45dc-9e33-276e9839a0a8> .
<urn:uuid:25d8fc8b-2b63-45dc-9e33-276e9839a0a8> a
prov:Entity,
prov:KeyEntityPair ;
prov:pairEntity ex:someFile ;
prov:pairKey "filename.txt"^^xsd:string .
However the PROV-N output does not match PROV-Dictionary section 4:
entity(ex:someDirectory, [prov:type='prov:Dictionary', prov:type='prov:Collection', prov:hadDictionaryMember='id:25d8fc8b-2b63-45dc-9e33-276e9839a0a8'])
hadMember(ex:someDirectory, ex:someFile)
entity(id:25d8fc8b-2b63-45dc-9e33-276e9839a0a8, [prov:type='prov:KeyEntityPair', prov:pairKey="filename.txt", prov:pairEntity='ex:someFile'])
If this was supported the membership should come in PROV-N as:
prov:hadDictionaryMember(ex:someDirectory, ex:someFile, "filename.txt")
Is there a way to add such name-spaced statements to PROV-N with prov.py?
Similarly expressed in PROV-XML according to PROV-Dictionary section 6 we would expect something like:
<prov:collection prov:id="ex:someDirectory" />
<prov:hadMember>
<prov:collection prov:ref="ex:someDirectory"/>
<prov:entity prov:ref="ex:someFile"/>
</prov:hadMember>
<prov:dictionary prov:id="ex:someDirectory" />
<prov:hadDictionaryMember>
<prov:dictionary prov:ref="ex:someDirectory"/>
<prov:keyEntityPair>
<prov:key>filename.txt</prov:key>
<prov:entity prov:ref="ex:someFile"/>
</prov:keyEntityPair>
</prov:hadDictionaryMember>
but with our workaround we get:
<prov:collection prov:id="ex:someDirectory">
<prov:type xsi:type="xsd:QName">prov:Dictionary</prov:type>
<prov:hadDictionaryMember xsi:type="xsd:QName">id:25d8fc8b-2b63-45dc-9e33-276e9839a0a8</prov:hadDictionaryMember>
</prov:collection>
<prov:hadMember>
<prov:collection prov:ref="ex:someDirectory"/>
<prov:entity prov:ref="ex:someFile"/>
</prov:hadMember>
<prov:entity prov:id="id:25d8fc8b-2b63-45dc-9e33-276e9839a0a8">
<prov:type xsi:type="xsd:QName">prov:KeyEntityPair</prov:type>
<prov:pairEntity xsi:type="xsd:QName">id:aa96fdb4-ecb6-4488-9a9b-00f0c17a1fbd</prov:pairEntity>
<prov:pairKey>rsem_reference.seq</prov:pairKey>
</prov:entity>
Note that this style seems to survive a round-trip from PROV-O via PROV-XML over to PROV-O again.
Obviously we can blame the PROV-Dictionary spec for not implementing it in this PROV-O style also in PROV-XML and PROV-N (which would then have been backwards compatible to all PROV syntaxes)
This issue however asks for some prov.py API support for making PROV-Dictionary statements across all syntaxes.
It might ideally need some hacks to have consistent serialization and parsing though - but as a first attempt I would suggest adding support for our approach as it would not cause issues in loading/saving. Also I think the implication of a Dictionary being a Collection should be implied for compatibility with consumers not understanding PROV-Dictionary, but I understand if that can be harder to maintain in a mutable prov model in memory (e.g. there could be multiple keys having same value).