pulibrary/pdc_discovery

Bug: Duplicate record from DataSpace and PDC

Closed this issue · 3 comments

Hello, previously we migrated the following DataSpace record:

However, we find two versions of this in Discovery:
From DataSpace indexing:
https://datacommons.princeton.edu/discovery/catalog/150055

From PDC Describe:
https://datacommons.princeton.edu/discovery/catalog/doi-10-11578-1888261

Both refer to the same ARK.

I noticed this because when I selected "Princeton Plasma Physics Laboratory" from the Discovery "Community" facet, there was a "ITER and Tokamaks" Community with one record.

Searching by title, it's visible that there are two records:
https://datacommons.princeton.edu/discovery/?search_field=title&q=Verification%2C+validation%2C+and+results+of+an+approximate+model+for+the+stress+of+a+Tokamak+toroidal+field+coil+at+the+inboard+midplane

My understanding is that PDC metadata takes precedence over DataSpace. It seems as though this one was done in error.

@astrochun I think the issue is the arks are not exactly the same

Yes it is the "same", but the first one has an extra ark: in the url

http://arks.princeton.edu/ark:/88435/dsp01rb68xg060
http://arks.princeton.edu/88435/dsp01rb68xg060

Both URLs go to a the different records in PDC Discovery. I would assume that there is a typo in one of those urls?

Thanks for pointing out the problem @carolyncole. I've updated the metadata for the second record and will check again later to see if Discovery resolve this issue automatically with the correct ARK. Will wait to close this issue until then.

Issue appears to have resolved itself after the metadata fixed.