LDBlockShow
1 Introduction
LDBlockShow is a fast and effective tool to generate linkage disequilibrium (LD) heatmap from VCF files. It is more time and memory saving than other current tools. LDBlockShow can generate the plots of LD heatmap and interested statistics or annotation results simultaneously. In addition, it also supports subgroup analysis.
The LDBlockShow article has been published in briefings in bioinformatics, please cited this article if possible
PMID: 33126247 DOI:10.1093/bib/bbaa227
2 Download and Install
2.1. linux/MaxOS Download
2.2 Pre-install
LDblockshow is for Linux/Unix/macOS only. Before installing, please make sure the following pre-requirements are ready to use.
1) g++ : g++ with --std=c++11 > 4.8+ is recommended
2) zlib : zlib > 1.2.3 is recommended
3) Perl : The SVG.pm in Perl should be installed. LDBlockShow uses this module to plot figures. We have provided a built-in SVG module in the package.
2.3 Install
Users can install it with the following options:
Option 1:
git clone https://github.com/BGI-shenzhen/LDBlockShow.git chmod 755 configure ; ./configure; make; mv LDBlockShow bin/; # [rm *.o]
Note: If failed to link,try to re-install the libraries zlib
Option 2:
tar -zxvf LDBlockShowXXX.tar.gz cd LDBlockShowXXX; cd src; sh make.sh ## Linux : [ make ; make clean ] ../bin/LDBlockShow
Note: For macOS, if plink doesn't work, please re-download mac plinks and put it into the directory [LDBlockShowXXX/bin]
Note: If failed to link,try to re-install the libraries zlib
3 Parameter description
3.1 LDBlockShow
3.1.1 Main parameter
./bin/LDBlockShow
Usage: LDBlockShow -InVCF <in.vcf.gz> -OutPut <outPrefix> -Region chr1:10000-20000
-InVCF <str> Input SNP VCF Format
-OutPut <str> OutPut File of LD Blocks
-Region <str> In One Region to show LD info svg Figure
-SeleVar <int> Select statistic for deal. 1: D' 2: R^2 3/4: Both [1]
-SubPop <str> SubGroup Sample File List[ALLsample]
-BlockType <int> method to detect Block [beta] [1]
1. Block by PLINK (Gabriel method)
2. Solid Spine of LD RR/D' 3. Blockcut with self-defined RR/D'
4. FixBlock by input blocks files 5. No Block
-InGWAS <str> InPut GWAS Pvalue File(chr site Pvalue)
-InGFF <str> InPut GFF3 file to show Gene CDS and name
-BlockCut <float> 'Strong LD' cutoff and ratio for BlockType3[0.85:0.90]
-FixBlock <str> Input fixed block region
-MerMinSNPNum <int> merger color grids when SNPnumber over N[50]
-help Show more Parameters and help [hewm2008 v1.40]
Details for above parameters:
-InVCF The input file in VCF format -OutPut The output file directory and output file name prefix (e.g., /path/pop1) -Region The defined region to show the LD heatmap (format: chr:start:end) -SeleVar The LD measurement (1: D' 2: R^2 3/4: Both R^2 and D'), the default is 1. -SubPop A sample list for subgroup analysis -BlockType The definition of blocks. The default 1 is called by PLINK1 to generate the block defined by Gabriel et al.2. Solid spine of LD3 is also supported [2]. Users can also define their own cutoff of r2 and D' for blocks [3] combined with the option of "-BlockCut" or supply their own block region definition [4] combined with the option of "-FixBlock". 5 can be used as input if users prefer to not show the block region. -InGWAS The statistics file (e.g., association statistics, but other values such as Tajima's D can also be accepted) for generate plot together with the LD plot. File formatted as: [chr position Pvalue] -InGFF Input GFF3 format file for genomic region annotation -BlockCut For block type 3, the defined cutoff for strong LD, and the ratio of strong LD SNP in one block. Default is 0.85:0.9. That's, if the user chose D' in the -SeleVar option, then in one block, the ratio of SNP pairs with' over 0.85 is 0.9. -FixBlock For block type 4, users can use this option to supply a self-defined block region. The file contains three columns, including chromosome, block region start position, and block region end position. -MerMinSNPNum The minimum SNP number to merge color grids with the same color. Default is 50. Details please see Fig 1 in this manual. -help Show more parameters
3.1.2 Other parameters
./bin/LDBlockShow -h
Para [-i] is show for [-InVCF], Para [-o] is show for [-OutPut], Para [-r] is show for [-Region]
-InGenotype <str> InPut SNP Genotype Format
-InPlink <str> InPut Plink [bed+bim+fam] or [ped+map] file prefix
-MAF <float> Min minor allele frequency filter [0.05]
-Miss <float> Max ratio of miss allele filter [0.25]
-HWE <float> Exact test of Hardy-Weinberg Equilibrium for SNP Pvalue[0]
-Het <float> Max ratio of het allele filter [1.0]
-TagSNPCut <float> 'Strong LD' cutoff for TagSNP [0.80]
-OutPng convert svg 2 png file
-OutPdf convert svg 2 pdf file
Details for above parameters:
-InGenotype Input file in genotype format. -InPlink The prefix of input file in PLINK format. -MAF Filter SNPs with low minor allele frequency (default <0.05) -HWE Filter SNPs with low Exact test of Hardy-Weinberg Equilibrium Pvalue (default <0) -Het Filter SNPs with high heterozygosis ratio (default >1.0) -Miss Filter SNPs with high missing rate (default >0.25) -TagSNPCut The LD cutoff for selecting tag SNPs. Default is 0.8. -OutPng Convert the SVG file to PNG file -OutPdf Convert SVG file to Pdf file.
Note: If users failed to open small SVG files, please use the "-Outpdf" option to use the PDF file. For large SVG files, "-OutPng" can be used to get a relatively small figure file.
3.2 ShowLDSVG
This program is designed for users to optimize the figure (e.g., change colors) generated by LDBlockShow.
3.2.1 Brief parameters
./bin/ShowLDSVG
Options
-InPreFix <s> : InPut Region LD Result Frefix
-OutPut <s> : OutPut svg file result
-help : Show more help with more parameter
-InPreFix The prefix of input file (i.e., the output file of LDBlockShow) -OutPut The out file (svg, png and pdf format plot files) -help More parameters in detail
3.2.2 Detail parameters
./bin/ShowLDSVG -h
-InGWAS <s> : InPut GWAS Pvalue File(chr site Pvalue)
-NoLogP : Do not get the log Pvalue
-Cutline <f> : show the cut off line of Pvalue
-TopSite <n> : InPut the Special Site as the peak site(chr:pos)
-PointSize <n> : set the GWAS point size number
-SpeSNPName <s> : In File for Special SNP Name(chr site Name)
-ShowGWASSpeSNP : show Special SNP Name in GWAS plot with [-SpeSNPName]
-InGFF <s> : InPut GFF3 file to show Gene CDS and name
-NoGeneName : No show Gene name,only show stuct
-crGene <s> : InColor for Gene Stuct [CDS:Intron:UTR:Intergenic]
default: [#e7298a:lightblue:#7570b3:#a6cee3]
-crBegin <s> : In Start Color RGB [255,255,255]
-crMiddle <s> : In Middle Color RGB [240,235,75]
-crEnd <s> : In End Color RGB [255,0,0]
-NumGradien <s> : In Number of gradien of color
-crTagSNP <s> : Color for TagSNP [231,138,195]
-CrGrid <s> : the color of grid edge [white]
-WidthGrid <s> : the edge-width of gird [1]
-NoGrid : No Show the gird edge
-ShowNum : Show the R^2/D' in the heatmap
-NoShowLDist <n> : NoShow long physical distance pairwise[1000000]
-MerMinSNPNum <s> : merge color grids when SNPnumber over N[50]
-OutPng : convert svg 2 png file
-OutPdf : convert svg 2 pdf file
-ResizeH : resize image height; Width be resize in ratio[4096]
-MoreHelp : Show some hidden para to adjust figure(less use)
Details for above parameters:
-InGWAS The statistics file (e.g., association statistics, but other values such as Tajima's D can also be accepted) for generate plot together with the LD plot. File formatted as: [chr position Pvalue] -NoLogP By default, the P value from the -InGWAS file will be -log10 transformed, with this option, the P value will not be transformed. -Cutline The significance cutline of the -InGWAS file. -TopSite Users can use this option to assign one interested SNP in the GWAS plot (Default is the most significant SNP, can be changed with chr:pos). -PointSize Users can use this option (any number over 0) to set the point size. -InGFF The GFF file for genomic region annotation. By default, the gene name will be shown in the plot; -NoGeneName Gene name will not be shown in the plot with this option. -SpeSNPName With this option, users can input a file to indicate the names for interested SNPs, these names will be shown in the heatmap. -ShowGWASSpeSNP Users can use this option together with the file assigned by '-SpeSNPName' to show the names of interested SNPs in GWAS plot. -crGene Define the colors of different genomic regions. By default, CDS, intron, UTR and intergenic regions will be shown in #e7298a, light blue, #7570b3, and #a6cee3, respectively. Parameters to optimize the color of the heatmap: -crBegin Color for no LD (R^2/D'=0) default: white -crMiddle Color for R^2/D'=0.5, default: yellow -crEnd Color for complete LD (R^2/D'=1), default: red -NumGradien The number of gradients from crBegin to crEnd -crTagSNP Color for the tag SNP. Parameters to optimize the grids in the heatmap: -CrGrid Border color of the grids, default: white -WidthGrid The width of the border, default = 1 -NoGrid No border -ShowNum Show the LD measurement value in the grids (not recommended when SNP number is over 50). -NoShowLDist When the distance between SNPs over this number, their pairwise LD will not be showed in the figure. Default is 10,000,000. -MerMinSNPNum When number of SNPs over the default 50, ShowLDSVG will merge adjacent same color grids. User can change this number to any integer numbers. -OutPng Convert the SVG file to PNG file -OutPdf Convert SVG file to Pdf file. -ResizeH Set the height of the image (default 4096), which can be used to adjust the resolution for PNG file. The width will be adjusted automatically.
Note: When SNP number is large (e.g., over 100), the output SVG file might be very large. ShowLDSVG will merge adjacent same color grids. With smaller number of gradients (set by -NumGradien), the figure will be compressed to be smaller. -MerMinSNPNum is used to set the minimum number of SNPs, that's, if there is more SNPs than this number (default 50), the output SVG will be compressed.
3.3 Output files
out.site.gz: Remained SNPs after filtering [chr site] out.blocks.gz: Block file [chr start end block_length SNP_number SNPs] out.TriangleV.gz: Region Pairwise R^2/D' out.svg: Output plot in SVG format out.png: Output plot in png format out.pdf: Output plot in pdf format
4 Example
See more detailed usage in the Chinese Documentation
See more detailed usage in the English Documentation
See the example directory and Manual.pdf for more detail.
- Example 1) show Figure with Defaut LD Blocks
#../../bin/LDBlockShow -InVCF Test.vcf.gz -OutPut out -Region chr11:24100000:24200000 -OutPng -SeleVar 1 ../../bin/LDBlockShow -InVCF Test.vcf.gz -OutPut out -Region chr11:24100000:24200000 -OutPng -SeleVar 2 # [-SeleVar 1] is D',[-SeleVar 2] is RR ,[-SeleVar 3] are RR and D',[-SeleVar 4] are D' and RR # the default is D'
- Example 2) Output LDHeatMap combined with GWAS statistics
#../../bin/LDBlockShow -InVCF ../Example1/Test.vcf.gz -OutPut out -Region chr11:24100000:24200000 -InGWAS gwas.pvalue -OutPng
../../bin/LDBlockShow -InVCF ../Example1/Test.vcf.gz -OutPut out -Region chr11:24100000:24200000 -InGWAS gwas.pvalue -OutPng -SeleVar 4
## you can run ShowLDSVG with more parameters to optimize the plot ##
# ../../bin/ShowLDSVG -InPreFix out -OutPut out -InGWAS gwas.pvalue -Cutline 7 -ShowNum -PointSize 3 -OutPng
- Example 3) show Figure with genomic annotation
#../../bin/LDBlockShow -InVCF ../Example1/Test.vcf.gz -OutPut out -InGWAS gwas.pvalue -InGFF In.gff -Region chr11:24100000:24200000 -OutPng -SeleVar 1
../../bin/LDBlockShow -InVCF ../Example1/Test.vcf.gz -OutPut out -InGWAS gwas.pvalue -InGFF In.gff -Region chr11:24100000:24200000 -OutPng -SeleVar 2
## you can run ShowLDSVG with more parameters to optimize the plot ##
#../../bin/ShowLDSVG -InPreFix out -OutPut out.svg -InGWAS gwas.pvalue -Cutline 7 -InGFF In.gff -crGene yellow:lightblue:pink:orange -showNum -OutPng
#../../bin/ShowLDSVG -InPreFix out -OutPut out.svg -InGFF In.gff
#../../bin/ShowLDSVG -InPreFix out -OutPut out.svg -InGWAS gwas.pvalue -Cutline 7 -InGFF In.gff -crGene yellow:lightblue:pink:orange -showNum -OutPng -SpeSNPName Spe.snp -ShowGWASSpeSNP
- Example 4) show Figure (heatmap+Annotation+GWAS similar to LocusZoom)
../../bin/LDBlockShow -InVCF ../Example1/Test.vcf.gz -OutPut out -InGWAS ../Example3/gwas.pvalue -InGFF ../Example3/In.gff -Region chr11:24100000:24200000 -OutPng -SeleVar 4 -TopSite
# [-SeleVar 3]: GWAS with RR ,heatmap with D'. [-SeleVar 4]: GWAS with D' ,heatmap with RR.
## you can run ShowLDSVG with more parameters to optimize the plot with para [-TopSite] ##
../../bin/ShowLDSVG -InPreFix out -OutPut out.svg -InGWAS ../Example3/gwas.pvalue -Cutline 7 -InGFF ../Example3/In.gff -crGene yellow:lightblue:pink:orange -showNum -OutPng -SpeSNPName ../Example3/Spe.snp -ShowGWASSpeSNP -TopSite
#../../bin/ShowLDSVG -InPreFix out -OutPut out.svg -InGWAS ../Example3/gwas.pvalue -Cutline 7 -InGFF ../Example3/In.gff -crGene yellow:lightblue:pink:orange -showNum -OutPng -SpeSNPName ../Example3/Spe.snp -ShowGWASSpeSNP -TopSite chr11:24142660
5 Advantages
To evaluate the performance of LDBlockShow, we used test VCF files to generate the LD heatmap by using LDBlockShow, Haploview[4], LDheatmap[5] and gpart. The calculated r2 and D’values of LDBlockShow is the same with other tools. As shown in Figure below, LDBlockShow is more time and memory saving than other tools.
The above figure shows the comparison of computing cost for LDBlockShow, Ldheatmap, Haploview and gpart. CPU time (A) and memory cost (B) for different methods are shown with a fixed SNP number of 100 and sample size ranging from 2,000 to 60,000. CPU time (C) and memory cost (D) for different methods are shown with a fixed sample size of 2,000 and SNP number ranged from 100 to 1,200. When testing datasets in A-D, both LDBlockShow and gpart finished the analyses within reasonable time and memory. We further tested their performance when handling large dataset. CPU time (E) and memory cost (F) for these two methods are shown with a fixed sample size of 100,000 and SNP number ranged from 300 to 2,500. Computation is performed with one thread of an Intel Xeon CPU E5-2630 v4.
As shown in Table below, LDBlockShow can generate the plots of LD heatmap and interested statistics or annotation results simultaneously. In addition, LDBlockShow also supports subgroup analysis.
Performance | LDBlockShow | Haploview | LDheatmap | gpart |
---|---|---|---|---|
Input | ||||
Compressed VCF file | √ | × | × | × |
Uncompressed VCF file | √ | × | × | √ |
Support subgroup analysis | √ | × | × | × |
Output | ||||
Visualize additional statistics | √ | × | × | × |
Visualize genomic annotation | √ | × | × | √ |
Compressed SVG | √ | × | × | × |
PNG file | √ | √ | × | √ |
Block region | √ | √ | × | √ |
LD measurement | R2/D' | R2/D' | R2 | R2/D' |
6 An example image generated by LDBlockShow.
7 Discussing
📧 hewm2008@gmail.com / hewm2008@qq.com- join the QQ Group : 125293663
- The LDBlockShow article has been published in briefings in bioinformatics, please cited this article if possible
- PMID: 33126247 DOI:10.1093/bib/bbaa227
######################swimming in the sky and flying in the sea #############################