Sample lookup optimized tables with `--append` flag
Opened this issue · 0 comments
samanvp commented
The way we create sample lookup optimized is inefficient, consider the following typical workflow:
- Run VT for the first batch of VCF files (with
--sample_lookup_optimized_output_table
set and without--append
). - Run VT for the second batch of VCF files (with
--sample_lookup_optimized_output_table
and--append
set). - Run VT for the third batch of VCF files (with
--sample_lookup_optimized_output_table
and--append
set).
...
Currently the way we load data into sample optimized tables is by querying variant optimized tables, flattening the call
column, and then copying the result into sample optimized tables. In this implementation (#606), with each new run of VT (with --append
set) we read all rows of variant optimized tables and load the result into sample optimized tables with write_disposition='WRITE_TRUNCATE'
.
A more efficient implementation would be to flatten and add only newly added rows with write_disposition='WRITE_APPEND'
.