trungdong/prov

Support PROV-Dictionary?

stain opened this issue · 0 comments

stain commented

We've been wanting to use PROV-Dictionary extension with prov.py, but it's a bit tricky if we want to serialize in multiple formats.

Our current workaround is to register the regular membership of prov:Collection as supported by prov.py, and also say it's a prov:Dictionary:

entity = document.entity("ex:someFile")
coll = document.entity("ex:someDirectory", [                    
                     (provM.PROV_TYPE, PROV["Collection"]),
                     (provM.PROV_TYPE, PROV["Dictionary"]),
               ])

Then regular membership is easy:

document.membership(coll, entity)

prov.py does however not have a dictionaryMembership method. To express the PROV Dictionary we use a PROV-O compatible attributes:

# Membership relation
m_entity  = document.entity(uuid.uuid4().urn, [
  (PROV["KeyEntityPair"])
  ])
m_entity.add_attributes({
    PROV["pairKey"]: entry["basename"],
    PROV["pairEntity"]: entity,
})

This workaround produces PROV-O statements correct according to PROV-Dictionary section 5:

ex:someDirectory a 
        prov:Collection,
        prov:Dictionary,
        prov:Entity ;
    prov:hadMember ex:someFile ;
    prov:hadDictionaryMember <urn:uuid:25d8fc8b-2b63-45dc-9e33-276e9839a0a8> .

<urn:uuid:25d8fc8b-2b63-45dc-9e33-276e9839a0a8> a 
        prov:Entity,
        prov:KeyEntityPair ;
    prov:pairEntity ex:someFile ;
    prov:pairKey "filename.txt"^^xsd:string .

However the PROV-N output does not match PROV-Dictionary section 4:

 entity(ex:someDirectory, [prov:type='prov:Dictionary', prov:type='prov:Collection', prov:hadDictionaryMember='id:25d8fc8b-2b63-45dc-9e33-276e9839a0a8'])

  hadMember(ex:someDirectory, ex:someFile)

  entity(id:25d8fc8b-2b63-45dc-9e33-276e9839a0a8, [prov:type='prov:KeyEntityPair', prov:pairKey="filename.txt", prov:pairEntity='ex:someFile'])

If this was supported the membership should come in PROV-N as:

prov:hadDictionaryMember(ex:someDirectory, ex:someFile, "filename.txt")

Is there a way to add such name-spaced statements to PROV-N with prov.py?

Similarly expressed in PROV-XML according to PROV-Dictionary section 6 we would expect something like:

<prov:collection prov:id="ex:someDirectory" />
<prov:hadMember>
    <prov:collection prov:ref="ex:someDirectory"/>
    <prov:entity prov:ref="ex:someFile"/>
</prov:hadMember>


<prov:dictionary prov:id="ex:someDirectory" />

<prov:hadDictionaryMember>
    <prov:dictionary prov:ref="ex:someDirectory"/>
    <prov:keyEntityPair>
        <prov:key>filename.txt</prov:key>
        <prov:entity prov:ref="ex:someFile"/>
    </prov:keyEntityPair>
</prov:hadDictionaryMember>

but with our workaround we get:

  <prov:collection prov:id="ex:someDirectory">
    <prov:type xsi:type="xsd:QName">prov:Dictionary</prov:type>
    <prov:hadDictionaryMember xsi:type="xsd:QName">id:25d8fc8b-2b63-45dc-9e33-276e9839a0a8</prov:hadDictionaryMember>
  </prov:collection>

<prov:hadMember>
    <prov:collection prov:ref="ex:someDirectory"/>
    <prov:entity prov:ref="ex:someFile"/>
</prov:hadMember>

  <prov:entity prov:id="id:25d8fc8b-2b63-45dc-9e33-276e9839a0a8">
    <prov:type xsi:type="xsd:QName">prov:KeyEntityPair</prov:type>
    <prov:pairEntity xsi:type="xsd:QName">id:aa96fdb4-ecb6-4488-9a9b-00f0c17a1fbd</prov:pairEntity>
    <prov:pairKey>rsem_reference.seq</prov:pairKey>
  </prov:entity>

Note that this style seems to survive a round-trip from PROV-O via PROV-XML over to PROV-O again.

Obviously we can blame the PROV-Dictionary spec for not implementing it in this PROV-O style also in PROV-XML and PROV-N (which would then have been backwards compatible to all PROV syntaxes)

This issue however asks for some prov.py API support for making PROV-Dictionary statements across all syntaxes.

It might ideally need some hacks to have consistent serialization and parsing though - but as a first attempt I would suggest adding support for our approach as it would not cause issues in loading/saving. Also I think the implication of a Dictionary being a Collection should be implied for compatibility with consumers not understanding PROV-Dictionary, but I understand if that can be harder to maintain in a mutable prov model in memory (e.g. there could be multiple keys having same value).