OHDSI/ETL-Synthea

syntheaFileLoc = syntheaFileLoc

rwayne opened this issue · 12 comments

Able to run the first 2 ETL commands and also load the vocabulary files.

I have syntheaFileLoc set to - 'c:\Users\sasrwt\synthea\output\csv' but am getting this error in the path where the Synthea files are located? Forward slash. I am running 64 bit Windows. Within the csv folder there a folders for each entry as you likely know. How can I define the path to read in the data with each subfolder of the output folder? Followed example syntheaFileLoc convention.

ETLSyntheaBuilder::LoadSyntheaTables(connectionDetails = cd, syntheaSchema = syntheaSchema, syntheaFileLoc = syntheaFileLoc)
Connecting using PostgreSQL driver
Error in data.table::fread(file = paste0(syntheaFileLoc, "/", csv), stringsAsFactors = FALSE, :
File 'c:\Users\sasrwt\synthea\output\csv/allergies.csv' does not exist or is non-readable. getwd()=='C:/Users/sasrwt/Documents/OMOP'

@rwayne Hi Wayne,

In an R session on your windows platform, please try the following and share the results:

syntheaFileLoc <- "C:/Users/sasrwt/synthea/output/csv"
list.files(syntheaFileLoc)

Thanks.

@rwayne Hi Wayne,

Tell help me diagnose the issue, can you run the following and post the results here?

syntheaFileLoc <- "C:/Users/sasrwt/synthea/output/csv"
list.files(syntheaFileLoc)

I made a similar request for the unix question as well.
Thanks.

@rwayne Thanks for the copy/paste, for some reason my browser isn't rendering your image.

Ok, so if the output of

list.files(syntheaFileLoc)

is:

[1] "2021_04_20T14_36_41Z" "2021_04_21T12_49_35Z" "2021_04_21T12_53_37Z" "2021_04_21T12_57_23Z" "2021_04_21T13_00_46Z" "2021_04_21T13_05_09Z"
[7] "2021_04_21T13_24_48Z" "2021_04_21T15_20_02Z" "2021_04_21T15_35_51Z" "2021_04_21T15_42_04Z"

... then the error is caused by the .csv files not being present in syntheaFileLoc. When you execute LoadSyntheaTables, the parameter, syntheaFileLoc, should point to the location of the .csv files you want to load. The function is expecting to find files like allergies.csv, but as list.files(syntheaFileLoc) reveals, they aren't there. Copy the synthea csv files to syntheaFileLoc and see if list.file(syntheaFileLoc) finds them. If so, LoadSyntheaTables should work.

@rwayne Glad we're making progress. : )

Does it expect just one set of csv file at the root level ?

Yes.

Regarding the error,

ERROR: column "start" of relation "procedures" does not exist

... this is because Synthea changed a date column in the PROCEDURES table from "date" to "start". It looks like the version of synthea you have is the latest, so you should specify:

syntheaVersion <- "master"

We allow you to specify the Synthea version so you can use an older version or the current version (because Synthea actively updates their Master branch, we decided to let 2.7.0 be the default). Unfortunately, you'll need to drop the Synthea tables and recreate them using CreateSyntheaTables, this time specifying syntheaVersion <- "master".

@rwayne Interesting. I wonder if the file names are case sensitive on unix, but not windows. Can you try it on your windows box?

@rwayne Thanks for the update and glad things are working now. : )

The code shouldn't break because of the case of the file names, so I'm going to work on a fix now and push it to Master shortly.