site-configurable tumor primary key
Closed this issue · 4 comments
Most PCORNet CDM tables have an id column. The NAACCR format does not.
In theory, patient id and sequence number uniquely identify each tumor, but in practice, sequence numbers are some times missing or duplicated. And things get really complicated when merging data from multiple sources.
Putting the burden on the producer to come up with one record per tumor, with a unique tumorid, is one way to go.
And in the case when the producer fails to eliminate duplicate records, a tumorid can facilitate communication about which record we're talking about.
I just discovered item 60 tumorRecordNumber. Combined with patient id, this looks like a unique, persistent key.
On the other hand, KUMC has around 30 patient id / tumor record pairs that occur in multiple tumour records.
I have a strategy starting with patientSystemIdHosp
, tumorRecordNumber
and supplementing it with site-specific columns to make the key unique; at KUMC, dateOfDiagnosis
, dateCaseCompleted
seems to do it.
naaccr-tumor-data/tumor_reg_data.py
Line 418 in 8626c0b
The site-specific unique key for KUMC works, but at other sites it's likely that a different combination of columns should be used, so it should be configurable.
we support configurable source of patient id number (MRN)
The current draft CDM TUMOR table spec doesn't call for a primary key for the tumor table.