1000s of duplicate entries in LabCAS 2.0 Solr-based RDF generation
Closed this issue · 1 comments
nutjob4life commented
The RDF being generated in https://github.com/EDRN/edrn.rdf/blob/master/edrn/rdf/edrnlabcasrdfgenerator.py creates thousands and thousands of duplicate entires (for "Automated Quantitative Measures of Breast Density Data"). It should make about 80 entires for EDRN LabCAS but is making over 4000.
nutjob4life commented
FYI, I made a slight change to use the /collections
instead of /datasets
endpoint and now I get 36 matches instead of 4000.
36 is way better! But I'm curious why eCAS has 76 🤷♀️