EDRN/edrn.rdf

1000s of duplicate entries in LabCAS 2.0 Solr-based RDF generation

Closed this issue · 1 comments

The RDF being generated in https://github.com/EDRN/edrn.rdf/blob/master/edrn/rdf/edrnlabcasrdfgenerator.py creates thousands and thousands of duplicate entires (for "Automated Quantitative Measures of Breast Density Data"). It should make about 80 entires for EDRN LabCAS but is making over 4000.

FYI, I made a slight change to use the /collections instead of /datasets endpoint and now I get 36 matches instead of 4000.

36 is way better! But I'm curious why eCAS has 76 🤷‍♀️