OHDSI/ETL-Synthea

error during LoadSyntheaTables

ginberg opened this issue · 4 comments

I am trying to setup my local postgres db with the synthea data and I am getting the error below. Any idea what I am doing wrong?

Connecting using PostgreSQL driver
  |============================================================================================================================================================| 100%
Executing SQL took 0.0856 secs
Running synthea_version/v270/create_synthea_tables.sql
Connecting using PostgreSQL driver
  |============================================================================================================================================================| 100%
Executing SQL took 0.019 secs
Connecting using PostgreSQL driver
Loading: allergies.csv
Loading: conditions.csv
  |============================================================================================================================================================| 100%
Loading: imaging_studies.csv
Loading: medications.csv
  |============================================================================================================================================================| 100%
Loading: organizations.csv
  |============================================================================================================================================================| 100%
Loading: procedures.csv
  |===                                                                                                                                                         |   2%
 Error in rJava::.jcall(batchedInsert, "Z", "executeBatch") : 
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO synthea_v270.procedures (START,STOP,PATIENT,ENCOUNTER,CODE,DESCRIPTION,BASE_COST,REASONCODE,REASONDESCRIPTION) VALUES('2013-01-14 -05','2013-01-14 -05','d465f5e7-60a7-5f4a-98c7-fd8582099117','231d9177-6a23-2649-2389-7db3be6462c1','710824005','Assessment of health and social care needs (procedure)',431.40000000000003,NULL,NULL) was aborted: ERROR: column "start" of relation "procedures" does not exist
Position: 38 Call getNextException to see other errors in the batch.
12.
stop(structure(list(message = "java.sql.BatchUpdateException: Batch entry 0 INSERT INTO synthea_v270.procedures (START,STOP,PATIENT,ENCOUNTER,CODE,DESCRIPTION,BASE_COST,REASONCODE,REASONDESCRIPTION) VALUES('2013-01-14 -05','2013-01-14 -05','d465f5e7-60a7-5f4a-98c7-fd8582099117','231d9177-6a23-2649-2389-7db3be6462c1','710824005','Assessment of health and social care needs (procedure)',431.40000000000003,NULL,NULL) was aborted: ERROR: column \"start\" of relation \"procedures\" does not exist\n Position: 38 Call getNextException to see other errors in the batch.", 
call = rJava::.jcall(batchedInsert, "Z", "executeBatch"), 
jobj = new("jobjRef", jobj = <pointer: 0x106767040>, jclass = "java/sql/BatchUpdateException")), class = c("BatchUpdateException", 
"SQLException", "Exception", "Throwable", "Object", "error", ...
11.
.jcheck()
10.
rJava::.jcall(batchedInsert, "Z", "executeBatch") at InsertTable.R#383
9.
insertTable.default(conn, tableName = paste0(syntheaSchema, ".", 
strsplit(csv, "[.]")[[1]][1]), data = as.data.frame(syntheaTable), 
dropTableIfExists = FALSE, createTable = FALSE, useMppBulkLoad = bulkLoad, 
progressBar = TRUE) at InsertTable.R#196
8.
DatabaseConnector::insertTable(conn, tableName = paste0(syntheaSchema, 
".", strsplit(csv, "[.]")[[1]][1]), data = as.data.frame(syntheaTable), 
dropTableIfExists = FALSE, createTable = FALSE, useMppBulkLoad = bulkLoad, 
progressBar = TRUE) at LoadSyntheaTables.r#96
7.
withCallingHandlers(expr, warning = function(w) if (inherits(w, 
classes)) tryInvokeRestart("muffleWarning"))
6.
suppressWarnings({
DatabaseConnector::insertTable(conn, tableName = paste0(syntheaSchema, 
".", strsplit(csv, "[.]")[[1]][1]), data = as.data.frame(syntheaTable), 
dropTableIfExists = FALSE, createTable = FALSE, useMppBulkLoad = bulkLoad, ... at LoadSyntheaTables.r#95
5.
ETLSyntheaBuilder::LoadSyntheaTables(connectionDetails, syntheaSchema, 
syntheaFileLoc) at codeToRun.R#33
4.
eval(ei, envir)
3.
eval(ei, envir)
2.
withVisible(eval(ei, envir))
1.

Looks like you simulated Synthea data with the current version of Synthea (3 or 3.1) but are attempting to convert using the 2.7 scripts. You likely need to pass syntheaVersion "3.0.0" as the functions default to "2.7.0" OR grab the 2.7 version of Synthea and use that to generate your data.

thanks for your reply! Indeed, the synthea version did not match. I am now using synthea version 3.0.0 and the codeToRun script is running longer but getting another error, see below. Any idea what could be wrong?

Connecting using PostgreSQL driver
  |================================================================================================================================================================| 100%
Executing SQL took 0.169 secs
Running synthea_version/v300/create_synthea_tables.sql
Connecting using PostgreSQL driver
  |================================================================================================================================================================| 100%
Executing SQL took 0.0225 secs
Connecting using PostgreSQL driver
Loading: allergies.csv
Loading: conditions.csv
  |================================================================================================================================================================| 100%
Loading: imaging_studies.csv
Loading: medications.csv
  |================================================================================================================================================================| 100%
Loading: organizations.csv
  |================================================================================================================================================================| 100%
Loading: procedures.csv
  |================================================================================================================================================================| 100%
Loading: careplans.csv
  |================================================================================================================================================================| 100%
Loading: encounters.csv
  |================================================================================================================================================================| 100%
Loading: immunizations.csv
  |================================================================================================================================================================| 100%
Loading: observations.csv
  |================================================================================================================================================================| 100%
Loading: patients.csv
  |================================================================================================================================================================| 100%
Loading: providers.csv
  |================================================================================================================================================================| 100%
Loading: devices.csv
Connecting using PostgreSQL driver
Working on file /Users/ginberg/Tools/vocabulary/CONCEPT_ANCESTOR.csv
 - reading file 
|--------------------------------------------------|
|==================================================|
 - type converting
 - uploading 69967932 rows of data in 7 chunks.
  |================================================================================================================================================================| 100%
Executing SQL took 0.00784 secs
 - chunk uploading started on 2022-10-16 15:44:46 for rows 1 to 10000001
  |================================================================================================================================================================| 100%
 - chunk uploading started on 2022-10-16 15:45:20 for rows 10000002 to 20000002
  |================================================================================================================================================================| 100%
 - chunk uploading started on 2022-10-16 15:45:55 for rows 20000003 to 30000003
  |================================================================================================================================================================| 100%
 - chunk uploading started on 2022-10-16 15:46:29 for rows 30000004 to 40000004
  |================================================================================================================================================================| 100%
 - chunk uploading started on 2022-10-16 15:47:03 for rows 40000005 to 50000005
  |================================================================================================================================================================| 100%
 - chunk uploading started on 2022-10-16 15:47:43 for rows 50000006 to 60000006
  |================================================================================================================================================================| 100%
 - chunk uploading started on 2022-10-16 15:48:21 for rows 60000007 to 69967932
  |================================================================================================================================================================| 100%
 - Success
Working on file /Users/ginberg/Tools/vocabulary/CONCEPT_CLASS.csv
 - reading file 
 - type converting
 - uploading 415 rows of data in 1 chunks.
  |================================================================================================================================================================| 100%
Executing SQL took 0.0108 secs
 - chunk uploading started on 2022-10-16 15:48:57 for rows 1 to 415
  |================================================================================================================================================================| 100%
 - Success
Working on file /Users/ginberg/Tools/vocabulary/CONCEPT_RELATIONSHIP.csv
 - reading file 
|--------------------------------------------------|
|==================================================|
 - handling dates
 - type converting
 - uploading 46577976 rows of data in 5 chunks.
  |================================================================================================================================================================| 100%
Executing SQL took 0.00679 secs
 - chunk uploading started on 2022-10-16 15:49:49 for rows 1 to 10000001
  |================================================================================================================================================================| 100%
 - chunk uploading started on 2022-10-16 15:51:07 for rows 10000002 to 20000002
  |==================================================================================================================================================================| 100%
 - chunk uploading started on 2022-10-16 15:52:24 for rows 20000003 to 30000003
  |==================================================================================================================================================================| 100%
 - chunk uploading started on 2022-10-16 15:53:40 for rows 30000004 to 40000004
  |==================================================================================================================================================================| 100%
 - chunk uploading started on 2022-10-16 15:54:56 for rows 40000005 to 46577976
  |==================================================================================================================================================================| 100%
 - Success
Working on file /Users/ginberg/Tools/vocabulary/CONCEPT_SYNONYM.csv
 - reading file 
 - type converting
 - uploading 2049733 rows of data in 1 chunks.
  |==================================================================================================================================================================| 100%
Executing SQL took 0.0045 secs
 - chunk uploading started on 2022-10-16 15:55:49 for rows 1 to 2049733
  |==================================================================================================================================================================| 100%
 - Success
Working on file /Users/ginberg/Tools/vocabulary/CONCEPT.csv
 - reading file 
|--------------------------------------------------|
|==================================================|
 - handling dates
 - type converting
 - uploading 5809891 rows of data in 1 chunks.
  |==================================================================================================================================================================| 100%
Executing SQL took 0.00701 secs
 - chunk uploading started on 2022-10-16 15:56:24 for rows 1 to 5809891
  |======                                                                                                                                                            |   4%Error in rJava::.jcall(batchedInsert, "Z", "executeBatch") : 
  java.sql.BatchUpdateException: Batch entry 9.139 INSERT INTO cdm_synthea_v540.CONCEPT (concept_id,concept_name,domain_id,vocabulary_id,concept_class_id,standard_concept,concept_code,valid_start_date,valid_end_date,invalid_reason) VALUES(36311145,NULL,'Meas Value','LOINC','Answer','S','LA12334-1','1970-01-01 -05','2099-12-31 -05',NULL) was aborted: ERROR: null value in column "concept_name" of relation "concept" violates not-null constraint
  Detail: Failing row contains (36311145, null, Meas Value, LOINC, Answer, S, LA12334-1, 1970-01-01, 2099-12-31, null).  Call getNextException to see other errors in the batch.
In addition: Warning messages:
1: `type_convert()` only converts columns of type 'character'.
- `df` has no columns of type 'character' 
2: In data.table::fread(file = paste0(vocabFileLoc, "/", csv), stringsAsFactors = FALSE,  :
  Found and resolved improper quoting out-of-sample. First healed line 36996: <<44833612	"ventilation" pneumonit	4180186>>. If the fields are not quoted (e.g. field separator does not appear within any field), try quote="" to avoid this warning.
3: In data.table::fread(file = paste0(vocabFileLoc, "/", csv), stringsAsFactors = FALSE,  :
 
 Error in rJava::.jcall(batchedInsert, "Z", "executeBatch") : 
java.sql.BatchUpdateException: Batch entry 9.139 INSERT INTO cdm_synthea_v540.CONCEPT (concept_id,concept_name,domain_id,vocabulary_id,concept_class_id,standard_concept,concept_code,valid_start_date,valid_end_date,invalid_reason) VALUES(36311145,NULL,'Meas Value','LOINC','Answer','S','LA12334-1','1970-01-01 -05','2099-12-31 -05',NULL) was aborted: ERROR: null value in column "concept_name" of relation "concept" violates not-null constraint
Detail: Failing row contains (36311145, null, Meas Value, LOINC, Answer, S, LA12334-1, 1970-01-01, 2099-12-31, null). Call getNextException to see other errors in the batch. 

Looks like this is the heart of the issue: ERROR: null value in column "concept_name" of relation "concept" violates not-null constraint

I'm not sure how you obtained your vocabulary files but it appears your concept file has records with no entry in the concept_name column. One possibility is CPT concepts for which you have to run a batch script to complete the entries using the UMLS web service, so if have haven't run that step, it could cause the issue. Otherwise I recommend looking at your concept file for which records have no concept_name, that is the problem.

thanks @fdefalco, it was indeed a problem with the concept_name. I made sure the concept name is not null and not too long and than it worked