digitalbazaar/pyld

Configurable IdentifierIssuer for to_rdf

aucampia opened this issue · 0 comments

I'm processing many JSON files as JSON-LD, currently I'm going via nquads to RDFLib, a problem I'm facing however is that blank node identifiers gets re-used for every document, and when I ingest them into an RDFLib graph different blank nodes look to be the same.

As a workaround I am doing this:

class IdentifierIssuer:
    existing: typ.ClassVar[typ.Dict[str, str]] = {}
    order: typ.ClassVar[typ.List[str]] = []
    prefix = '_:bu'

    def __init__(self, prefix: str):
        cls = self.__class__

    def get_id(self, old: typ.Optional[str] = None) -> str:
        cls = self.__class__
        # return existing old identifier
        if old and old in cls.existing:
            return cls.existing[old]

        # get next identifier
        id_ = cls.prefix + str(uuid.uuid4().hex)

        # save mapping
        if old is not None:
            cls.existing[old] = id_
            cls.order.append(old)

        return id_

    def has_id(self, old: str) -> bool:
        cls = self.__class__

        return old in cls.existing

jsonld.IdentifierIssuer = IdentifierIssuer

Would be nice if I could just pass it in as an argument to to_rdf