OHDSI/ETL-Synthea

Duplicate provider generated causing problem with insert_visit_occurrence

bdimitriadis opened this issue · 3 comments

The same provider with different specialty_source_value has been generated e.g. :

in the cdm schema:

60814 Dani520 Weber641 38004446 8532 cdeef28b-a513-3c9d-9d27-113fbaaa8cff UROLOGY 38004446 F 8532
60816 Dani520 Weber641 38004446 8532 cdeef28b-a513-3c9d-9d27-113fbaaa8cff GENERAL PRACTICE 38004446 F 8532

in the native schema:

cdeef28b-a513-3c9d-9d27-113fbaaa8cff c2751801-8047-30ce-ba9b-72afebaae76a Dani520 Weber641 F UROLOGY 1 WALLACE BASHAW JR WAY NEWBURYPORT MA 01950 42.812358 -70.89109499999998 0
cdeef28b-a513-3c9d-9d27-113fbaaa8cff c2751801-8047-30ce-ba9b-72afebaae76a Dani520 Weber641 F GENERAL PRACTICE 1 WALLACE BASHAW JR WAY NEWBURYPORT MA 01950 42.812358 -70.89109499999998 16218

This is causing an exception during insert_visit_occurrence (i.e. ERROR: duplicate key value violates unique constraint "xpk_visit_occurrence"), since join is done using pr.provider_source_value i.e. :
...
join native.encounters e
on av.encounter_id = e.id
and av.patient = e.patient
join cdm.provider pr
on e.provider = pr.provider_source_value

The same thing occurs with the insert_visit_detail.sql script, for the same reason (duplicate provider) i.e. : ERROR: duplicate key value violates unique constraint "xpk_visit_detail"

To overcome the problem (in both scripts), I substituted the line:
join @cdm_schema.provider pr

with the line:
join (select distinct on (provider_source_value) * from @cdm_schema.provider) pr

but I am not sure that it is the most suitable solution.

@bdimitriadis Hi, thanks for letting us know about this. According to the synthea documentation, what you generated above in the native data should not be possible: Synthea data dictionary

The Id field in the providers table is a primary key, so "cdeef28b-a513-3c9d-9d27-113fbaaa8cff" should not be in the .csv more than once. How did you generate the synthea data? Did you modify something in the properties file prior to generating?

Closing as issue appears to be with the generation of the synthea data.