"object_uri" == "named_time" == "pid"
Closed this issue · 1 comments
In hadatac_solr.gates.09-07-2017
, the contents of fields object_uri
, named_time
, and pid
is identical for a large number of datapoints. I am not sure what the intended format of the uri is, but this seems like it could be a glitch.
The named_time looks related to subject ID (this is from a CSV in the CPP dataset):
STUDYID | SUBJID | SUBJIDO | SITEID | SEXN | SEX | GAGEBRTH | BRTHYR |
---|---|---|---|---|---|---|---|
CPP | 51000110 | 51000110 | 5 | 1 | Male | nan | 1959 |
CPP | 51000210 | 51000210 | 5 | 1 | Male | nan | 1959 |
CPP | 51000310 | 51000310 | 5 | 1 | Male | 273 | 1959 |
CPP | 51000320 | 51000320 | 5 | 2 | Female | nan | 1962 |
CPP | 51000330 | 51000330 | 5 | 2 | Female | 266 | 1964 |
CPP | 51000340 | 51000340 | 5 | 1 | Male | nan | 1965 |
CPP | 51000410 | 51000410 | 5 | 1 | Male | 287 | 1959 |
CPP | 51000420 | 51000420 | 5 | 1 | Male | 287 | 1961 |
CPP | 51000430 | 51000430 | 5 | 2 | Female | 287 | 1963 |
CPP | 51000510 | 51000510 | 5 | 1 | Male | 287 | 1959 |
"51000110.0" is a value found in the dataset for object_uri/pid/named_time, and the general pattern of 510000### is common.
To reproduce:
Issue this query to get some representative data:
http://localhost:8984/solr/measurement/select?facet.field=value&facet=on&indent=on&q=study_uri:%22hbgd-kb:STD-CPP4%22&wt=json
The reported issue is correctly reported. Situation has changed and name we have the following: Named_time is used when provided, otherwise it is left blank. PIDs are still PIDs (i.e., user provided "pacient" ID). Under some situations we were using PID to fill up ObjectUri but now ObjectUri are proper URIs instead of PIDs.