An approach for detecting insertions with a deeplearning network.
- python 3.9, numpy, pandas, Matplotlib, TensorFlow 2.7, pysam
#create
conda create -n INSnet python=3.9
#activate
conda activate INSnet
#deactivate
conda deactivate
- After creating and activating the INSet virtual environment, download INSnet from github:
git clone https://github.com/eioyuou/INSnet.git
cd INSnet
conda activate INSnet
conda install numpy, pandas, Matplotlib, TensorFlow 2.7, pysam
python INSnet.py generate_feature bamfile_path_long output_data_folder max_work(default:5) includecontig
bamfile_path_long is the path of the alignment file about the reference and the long read set;
output_data_folder is a folder which is used to store evaluation data;
max_work is the number of threads to use;
includecontig is the list of contig to preform detection.(default: [], all contig are used)
eg: python INSnet.py generate_feature ./long_read.bam ./outpath 5 [12,13,14,15,16,17,18,19,20,21,22]
python INSnet.py call_sv insertion_predict_weight datapath bamfilepath outvcfpath support includecontig
insertion_predict_weight is the path of the model weights;
datapath is a folder which is used to store evaluation data;
bamfilepath is the path of the alignment file about the reference and the long read set;
outvcfpath is the path of output vcf file;
support is min support reads;
includecontig is the list of contig to preform detection.(default: [], all contig are used)
eg: python INSnet.py call_sv ./insertion_weights.h5(ccs_insertion_weights.h5) ./datapath ./long_read.bam ./outvcfpath 10 [12,13,14,15,16,17,18,19,20,21,22]
The data can be downloaded from:
https://ftp.ncbi.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/PacBio_MtSinai_NIST/Baylor_NGMLR_bam_GRCh37/HG002_PB_70x_RG_HP10XtrioRTG.bam
https://ftp.ncbi.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/UCSC_Ultralong_OxfordNanopore_Promethion/HG002_GRCh37_ONT-UL_UCSC_20200508.phased.bam
https://ftp.ncbi.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA24385_son/PacBio_CCS_15kb/alignment/HG002.Sequel.15kb.pbmm2.hs37d5.whatshap.haplotag.RTG.10x.trio.bam
http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/hgsv_sv_discovery/working/20160905_smithm_pacbio_aligns/NA19240_bwamem_GRCh38DH_YRI_20160905_pacbio.bam