The PIC Calculator R package is designed to calculate Polymorphism Information Content (PIC) values for Single Nucleotide Polymorphisms (SNPs) from a VCF (Variant Call Format) , HapMap (Haplotype Map) or Genetic Binary Table file. PIC values provide a measure of the informativeness of genetic markers for linkage and association studies.
where (P_{ij}) is the frequency of the (j^{th}) allele for the (i^{th}) SNP, and (n) is the number of alleles for the SNP.
- Calculate PIC Values: Compute PIC values for each SNP in a given VCF / HapMap / Genetic Binary Table file.
- Efficient Processing: Handle large files efficiently using optimized R functions coded using C.
- Easy Integration: Seamlessly integrate with other bioinformatics tools and pipelines in R.
You can install the package directly from GitHub using the devtools
package:
# Install devtools if you haven't already
install.packages("devtools")
# Install the PIC Calculator package from GitHub
devtools::install_github("AlsammanAlsamman/vcfPIC")
library(vcfPIC)
data("sheep_genotypes_hapmap")
PIC.hapmap <- calculatePIC(sheep_genotypes_hapmap, "hapmap")
head(PIC.hapmap)
data("sheep_genotypes_vcf")
PIC.vcf <- calculatePIC(sheep_genotypes_vcf, "vcf")
head(PIC.vcf)
The Genetic Binary Table file is a tab-delimited text file with the following format:
| rs | alleles | chr | pos | sample1 | sample2 | sample3 | sample4 | sample5 | sample6 | sample7 |
|-----|---------|-----|-----|---------|---------|---------|---------|---------|---------|---------|
| rs1 | A/T | 1 | 100 | 2 | 0 | 1 | 1 | -1 | 1 | 0 |
| rs2 | C/G | 1 | 200 | 0 | 2 | 0 | 1 | 2 | 0 | 1 |
| rs3 | A/C | 1 | 300 | 2 | 0 | 1 | 0 | 1 | 2 | 0 |
* rs: SNP ID
* alleles: Alleles of the SNP (The value can be "-" if not available)
* chr: Chromosome number of the SNP (The value can be "-" if not available)
* pos: Position of the SNP on the chromosome (The value can be "-" if not available)
data("sheep_genotypes_binary")
PIC.binary <- calculatePIC(sheep_genotypes_binary, "binary")
library(vcfPIC)
# The path to the package data, where the VCF , HapMap and Binary files are stored
vcfPICpath <- system.file(package="vcfPIC")
###### VCF
# Step 1: Read the VCF data
vcfData <- readVCF(paste(vcfPICpath,"/data/sheep_genotypes.vcf",sep="") )
head(vcfData)
# Step 2: Calculate allele frequencies from VCF data
freqVCF <- calculateAlleleFreqVCF(vcfData)
head(freqVCF)
# Step 3: Calculate PIC from allele frequencies of VCF data
PIC.vcf <- calculatePICByFreq(freqVCF)
head(PIC.vcf)
###### HapMap
# Step 1: Read the HapMap data
hapmapData <- readHapmap(paste(vcfPICpath,"/data/sheep_genotypes.hmp",sep=""))
# Step 1: Convert HapMap genotypes to binary numeric format
hapmapData.Binary <- convertGenoBi2Numeric(hapmapData)
head(hapmapData.Binary)
# Step 2: Calculate PIC directly from a VCF file
PIC <- calculatePIC("data/sheep_genotypes.vcf", "vcf")
head(PIC)
###### Binary
# Step 1: Read the binary genetic data
binaryData <- readGeneticBinaryTable(paste(vcfPICpath,"/data/sheep_genotypes_binary.tsv",sep=""), header=TRUE, sep="\t")
head(binaryData)
# Step 2: Calculate allele frequencies from binary data
TableBinarFreq <- calculateAlleleFreqBinary(binaryData)
head(TableBinarFreq)
# Step 3: Calculate PIC from allele frequencies of binary data
PIC.vcf <- calculatePICByFreq(TableBinarFreq)
head(PIC.vcf)
For any questions or inquiries, please contact Alsamman at a.alsamman[useAandT]cgiar.org.