kumc-bmi/naaccr-tumor-data

site-configurable tumor primary key

Closed this issue · 4 comments

dckc commented

Most PCORNet CDM tables have an id column. The NAACCR format does not.

In theory, patient id and sequence number uniquely identify each tumor, but in practice, sequence numbers are some times missing or duplicated. And things get really complicated when merging data from multiple sources.

Putting the burden on the producer to come up with one record per tumor, with a unique tumorid, is one way to go.

And in the case when the producer fails to eliminate duplicate records, a tumorid can facilitate communication about which record we're talking about.

dckc commented

I just discovered item 60 tumorRecordNumber. Combined with patient id, this looks like a unique, persistent key.

On the other hand, KUMC has around 30 patient id / tumor record pairs that occur in multiple tumour records.

dckc commented

I have a strategy starting with patientSystemIdHosp, tumorRecordNumber and supplementing it with site-specific columns to make the key unique; at KUMC, dateOfDiagnosis, dateCaseCompleted seems to do it.

# ## Unique key columns

dckc commented

The site-specific unique key for KUMC works, but at other sites it's likely that a different combination of columns should be used, so it should be configurable.

dckc commented

we support configurable source of patient id number (MRN)

The current draft CDM TUMOR table spec doesn't call for a primary key for the tumor table.