utility scripts for processing and summarizing Genomon-SV results
Python (>= 2.7), genomon_sv
(for merge_control, realign and primer), pysam
, primer3-py
(for primer) packages
tabix, bgzip, blat (for realign)
pip install sv_utils
Alternatively, if you want to install from the source code:
git clone https://github.com/friend1ws/sv_utils.git
cd sv_utils
pip install .
For detailed description on each option, please consult the help for each command
Summarize the frequency of each variant type (deletion, tandem_duplication, inversion and translocation) for each sample
sv_utils count [-h] [--inseq] result_list.txt output.txt
Summarize the frequency of each variant type (deletion, tandem_duplication, inversion, translocation) for each cancer gene
gene_summary [-h] --cancer_gene_list cancer_gene_list
[--inframe_info]
result_list.txt output.txt
For the --cancer_gene_list argument, the CancerGeneSummary/CancerGeneSummary.proc.txt
in the cancer_gene_db
repository or its variation could be used.
Filter out GenomonSV results outside specified conditions
sv_utils filter [-h] [--genome_id {hg19,hg38,mm10}] [--grc]
[--max_minus_log_fisher_pvalue MAX_MINUS_LOG_FISHER_PVALUE]
[--min_tumor_allele_freq MIN_TUMOR_ALLELE_FREQ]
[--max_control_allele_freq MAX_CONTROL_ALLELE_FREQ]
[--max_control_variant_read_pair MAX_CONTROL_VARIANT_READ_PAIR]
[--control_depth_thres CONTROL_DEPTH_THRES]
[--min_overhang_size MIN_OVERHANG_SIZE]
[--inversion_size_thres INVERSION_SIZE_THRES]
[--max_variant_size MAX_VARIANT_SIZE]
[--pooled_control_file POOLED_CONTROL_FILE]
[--pooled_control_num_thres POOLED_CONTROL_NUM_THRES]
[--simple_repeat_file SIMPLE_REPEAT_FILE]
[--remove_rna_junction]
genomonSV.result.txt output.txt
Annotate GenomonSV results
sv_utils annotation [-h] [--genome_id {hg19,hg38,mm10}] [--grc]
[--re_gene_annotation] [--closest_exon]
[--closest_coding] [--coding_info]
[--fusion_list FUSION_LIST]
genomonSV.result.txt output.txt
For the --fusion_list argument, the fusion_db.txt
file in the fusion_db
repository cound be used.
Add somatic SNVs and short indels to the GenomonSV results
sv_utils mutation [-h] --reference reference.fa
genomonSV.result.txt genomon_mutation.result.txt
output.txt
List up concentrated structural variations
sv_utils concentrate [-h] [--set_count set_count]
[--set_margin set_margin]
result_list.txt output.txt
Merge, compress and index the lists of GenomonSV results
sv_utils merge_control [-h] result_list.txt merge_control.bedpe.gz
Realign short reads around the structural variation candidates for mainly validation purpose
sv_utils realign [-h] --reference reference.fa --tumor_bam tumor.bam
[--control_bam control.bam]
genomonSV.result.txt output.txt
Generate primer sequence for mainly PCR validation
sv_utils primer [-h] --reference reference.fa
genomonSV.result.txt output.txt
convert to vcf format for short deletions and tandem duplications
sv_utils format [-h] --reference reference.fa [--format {vcf}]
[--max_size_thres MAX_SIZE_THRES]
genomonSV.result.txt output.txt
Get distance to nonB_DB annotated region for each SV candidate
sv_utils nonB_DB [-h] --nonB_DB nonB_DB.bed.gz
genomonSV.result.txt output.txt
For the --nonB_DB argument, the nonB_DB.bed.gz
in the nonB_DB
repository could be used.
Check recombination signal sequence motif near breakpoints
sv_utils RSS [-h] --reference reference.fa [--check_size CHECK_SIZE]
genomonSV.result.txt output.txt
Check AID motif (CG, WGCW) near breakpoints
sv_utils AID [-h] --reference reference.fa [--check_size CHECK_SIZE]
genomonSV.result.txt output.txt