Duplication of identifiers in pipe-delimited slot value lists
RichardBruskiewich opened this issue · 3 comments
RichardBruskiewich commented
It is (possibly) noted that some fields - e.g. provided_by slot - in KGX sometimes tend to accumulate duplicate (CURIE) identifiers. Rather, such lists should be managed internally as proper sets (without member duplication)?
In particular, we need to check the kgx merge operation for this anomaly, but also, perhaps other contexts.
sierra-moxon commented
I think this is fixed in : #408 - making a note to check.
RichardBruskiewich commented
Do we have a unit test to check this?
@sierra, is the relevant code in https://github.com/biolink/kgx/blob/master/kgx/utils/kgx_utils.py#L831? I'm not sure if this snippet of code avoids duplication in pipe-delimited lists...
RichardBruskiewich commented
I applied a fix of the above snippet of code in the List related PR #415