ccb-hms/nhanes-database

Change download / ETL process to directly insert tables via DBI interface

Opened this issue · 4 comments

Change download / ETL process to directly insert tables via DBI interface

The problem with this approach is that DBI::dbAppendTable(...) does RBAR inserts, which will make a mess of the transaction log. The reason we have been writing delimited files to disk is that we can then BULK INSERT them (in SQL Server, other DBMS have similar tools), thus avoiding the transaction log for everything but structural changes (eg page extent addition) to the table.

We should look into whether this is an issue for the RPostgres version of dbAppendTable(); in particular, see the copy argument in https://rpostgres.r-dbi.org/reference/postgres-tables.html.

If I read this correctly, this should do a COPY instead of multiple INSERTS by default, which seems to be the Postgres recommendation for this situation (https://www.postgresql.org/docs/current/populate.html).