paulopinheiro1234/hadatac

"object_uri" == "named_time" == "pid"

Closed this issue · 1 comments

In hadatac_solr.gates.09-07-2017, the contents of fields object_uri, named_time, and pid is identical for a large number of datapoints. I am not sure what the intended format of the uri is, but this seems like it could be a glitch.

The named_time looks related to subject ID (this is from a CSV in the CPP dataset):

STUDYID SUBJID SUBJIDO SITEID SEXN SEX GAGEBRTH BRTHYR
CPP 51000110 51000110 5 1 Male nan 1959
CPP 51000210 51000210 5 1 Male nan 1959
CPP 51000310 51000310 5 1 Male 273 1959
CPP 51000320 51000320 5 2 Female nan 1962
CPP 51000330 51000330 5 2 Female 266 1964
CPP 51000340 51000340 5 1 Male nan 1965
CPP 51000410 51000410 5 1 Male 287 1959
CPP 51000420 51000420 5 1 Male 287 1961
CPP 51000430 51000430 5 2 Female 287 1963
CPP 51000510 51000510 5 1 Male 287 1959

"51000110.0" is a value found in the dataset for object_uri/pid/named_time, and the general pattern of 510000### is common.

To reproduce:
Issue this query to get some representative data:
http://localhost:8984/solr/measurement/select?facet.field=value&facet=on&indent=on&q=study_uri:%22hbgd-kb:STD-CPP4%22&wt=json

The reported issue is correctly reported. Situation has changed and name we have the following: Named_time is used when provided, otherwise it is left blank. PIDs are still PIDs (i.e., user provided "pacient" ID). Under some situations we were using PID to fill up ObjectUri but now ObjectUri are proper URIs instead of PIDs.