googlegenomics/gcp-variant-transforms

merge_header pipeline should not run when --representative_header_file is set

mbookman opened this issue · 0 comments

When a --representative_header_file is specified to vcf_to_bq, the merge_header pipeline shouldn't need to run, but it always does. It looks like the necessary change is to add a check in vcf_to_bq.py before calling _merge_headers:

  if not known_args.representative_header_file:
    _merge_headers(known_args, pipeline_args,
                   pipeline_mode, avro_root_path, annotated_vcf_pattern)

or do it in merge_headers.

The run() function in vcf_to_bq seems inconsistent with regards to this kind of flag checking and switching:
It contains a check here before the call:

  if known_args.auto_flags_experiment:
    _get_input_dimensions(known_args, pipeline_args)

but then the next operation:

  annotated_vcf_pattern = _run_annotation_pipeline(known_args, pipeline_args)

is called without a check and it is in _run_annotation_pipeline that the check is done.