This is a small tool to convert a gVCF file to BED. This is useful for extracting regions that pass a certain genotype quality threshold.
gvcf2bed is now available through pypi with:
pip install gvcf2bed
conda install -c bioconda gvcf2bed
- Python 3.4+
- pyvcf
- cyvcf2
- pytest
- pytest-cov
- Fix a bug for variants where GQ is not defined.
- Use
cyvcf2
by default in tool. This results in a speed-up of approximately 8-10x. Existing API has not changed, and will still work withpyvcf
. - Add separate filter for non-variants, as GQ scores may have a different distribution on non-variant records than on variant records.
usage: gvcf2bed [-h] -I INPUT -O OUTPUT [-s SAMPLE] [-q QUALITY]
[-nq NON_VARIANT_QUALITY] [-b]
Create a BED file from a gVCF. Regions are based on a minimum genotype
quality. The gVCF file must contain a GQ field in its FORMAT fields. GQ scores
of non-variants records have a different distribution from the GQ score
distribution of variant records. Hence, an option is provided to set a
different threshold for non-variant positions.
optional arguments:
-h, --help show this help message and exit
-I INPUT, --input INPUT
Input gVCF
-O OUTPUT, --output OUTPUT
Output bed file
-s SAMPLE, --sample SAMPLE
Sample name in VCF file to use. Will default to first
sample (alphabetically) if not supplied
-q QUALITY, --quality QUALITY
Minimum genotype quality (default 20)
-nq NON_VARIANT_QUALITY, --non-variant-quality NON_VARIANT_QUALITY
Minimum genotype quality for non-variant records
(default 20)
-b, --bedgraph Output in bedgraph mode