OHDSI/ETL-Synthea

Table changes needed for synthea >=3.1 (Dec 2022)

Closed this issue ยท 5 comments

As I was working on loading synthea generated data I found that the following table schemas would need updated in this project. I was unable to make the changes and test them b/c the size of the data led me to use the psql \copy commands directly but this might help someone else quickly port to the newer Synthea:

  • patients : add fips (varchar) and income (int4)
  • payers : add ownership (varchar)
  • providers : remove utilization and add encounters (int4) and procedures (int4)

DDL for these tables (untested within the ETL-Synthea project):

CREATE TABLE synthea_3.patients (
	id varchar(1000) NULL,
	birthdate date NULL,
	deathdate date NULL,
	ssn varchar(100) NULL,
	drivers varchar(100) NULL,
	passport varchar(100) NULL,
	prefix varchar(100) NULL,
	"first" varchar(100) NULL,
	"last" varchar(100) NULL,
	suffix varchar(100) NULL,
	maiden varchar(100) NULL,
	marital varchar(100) NULL,
	race varchar(100) NULL,
	ethnicity varchar(100) NULL,
	gender varchar(100) NULL,
	birthplace varchar(100) NULL,
	address varchar(100) NULL,
	city varchar(100) NULL,
	state varchar(100) NULL,
	county varchar(100) NULL,
	zip varchar(100) NULL,
	lat numeric NULL,
	lon numeric NULL,
	healthcare_expenses numeric NULL,
	healthcare_coverage numeric NULL,
	fips varchar NULL,
	income int4 NULL
);

CREATE TABLE synthea_3.payers (
	id varchar(1000) NULL,
	"name" varchar(1000) NULL,
	ownership varchar NULL,
	address varchar(1000) NULL,
	city varchar(1000) NULL,
	state_headquartered varchar(1000) NULL,
	zip varchar(1000) NULL,
	phone varchar(1000) NULL,
	amount_covered numeric NULL,
	amount_uncovered numeric NULL,
	revenue numeric NULL,
	covered_encounters numeric NULL,
	uncovered_encounters numeric NULL,
	covered_medications numeric NULL,
	uncovered_medications numeric NULL,
	covered_procedures numeric NULL,
	uncovered_procedures numeric NULL,
	covered_immunizations numeric NULL,
	uncovered_immunizations numeric NULL,
	unique_customers numeric NULL,
	qols_avg numeric NULL,
	member_months numeric NULL
);

CREATE TABLE providers (
	id varchar(1000) NULL,
	organization varchar(1000) NULL,
	"name" varchar(100) NULL,
	gender varchar(100) NULL,
	speciality varchar(100) NULL,
	address varchar(255) NULL,
	city varchar(100) NULL,
	state varchar(100) NULL,
	zip varchar(100) NULL,
	lat numeric NULL,
	lon numeric NULL,
	encounters int4 NULL,
	"procedures" int4 NULL
);

Yes, we had just recently added support for 3.0. Will need to prioritize support for 3.1

Also table payer_transitions:

  • rename (start|end)_year to (start|end)_date
  • rename ownership para plan_ownership
  • rename ownername para owner_name

Thanks for the great tool.

Is this why I am seeing this error?

ETLSyntheaBuilder::LoadSyntheaTables(connectionDetails = cd, syntheaSchema = syntheaSchema, syntheaFileLoc = syntheaFileLoc)
Connecting using PostgreSQL driver
Loading: allergies.csv
| | 0%Error in rJava::.jcall(batchedInsert, "Z", "executeBatch") :
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO native.allergies ("1993-03-08",V2,"4b9c1991-8733-d3f6-777d-6310b5dd7af2","c689f326-df74-aaa5-2aa0-a4bdb17a963a","419199007",Unknown,"Allergy to substance (finding)",allergy,environment,V10,V11,V12,V13,V14,V15) VALUES(8467,NULL,'4b9c1991-8733-d3f6-777d-6310b5dd7af2','c689f326-df74-aaa5-2aa0-a4bdb17a963a',29046,'Unknown','Lisinopril','intolerance','medication',NULL,NULL,NULL,NULL,NULL,NULL) was aborted: ERROR: column "1993-03-08" of relation "allergies" does not exist
Position: 31 Call getNextException to see other errors in the batch.

Is this why I am seeing this error?

ETLSyntheaBuilder::LoadSyntheaTables(connectionDetails = cd, syntheaSchema = syntheaSchema, syntheaFileLoc = syntheaFileLoc)
Connecting using PostgreSQL driver
Loading: allergies.csv
| | 0%Error in rJava::.jcall(batchedInsert, "Z", "executeBatch") :
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO native.allergies ("1993-03-08",V2,"4b9c1991-8733-d3f6-777d-6310b5dd7af2","c689f326-df74-aaa5-2aa0-a4bdb17a963a","419199007",Unknown,"Allergy to substance (finding)",allergy,environment,V10,V11,V12,V13,V14,V15) VALUES(8467,NULL,'4b9c1991-8733-d3f6-777d-6310b5dd7af2','c689f326-df74-aaa5-2aa0-a4bdb17a963a',29046,'Unknown','Lisinopril','intolerance','medication',NULL,NULL,NULL,NULL,NULL,NULL) was aborted: ERROR: column "1993-03-08" of relation "allergies" does not exist
Position: 31 Call getNextException to see other errors in the batch.

@sudoshi It appears your allergies.csv file is missing the header. It should contain the following as the first line: START,STOP,PATIENT,ENCOUNTER,CODE,SYSTEM,DESCRIPTION,TYPE,CATEGORY,REACTION1,DESCRIPTION1,SEVERITY1,REACTION2,DESCRIPTION2,SEVERITY2

Thanks @rkboyce and @fabkury! Support for synthea 3.1.0 and 3.2.0 has been added to the development branch of the package based on #178 . We will continue to test the changes and hope to make a release soon to the main branch.