number of loci
cculma opened this issue · 2 comments
cculma commented
Hi, I am using polyRAD to generate allele dosage in alfalfa. I used NGSEP (https://sourceforge.net/p/ngsep/wiki/Home/) to generate the VCF. However the number of loci generated by VCF2RADdata
is lower compared with the original VCF. Do you know the way to obtain the same number of sites? Thank you.
myRAD7 <- VCF2RADdata("logan_8.vcf.bgz",
refgenome = "MtrunA17r5.0-20161119-ANR.fasta",
al.depth.field = "ACN",
possiblePloidies = list(4))
> myRAD7
## RADdata object ##
272 taxa and 3807 loci
3811212 total reads
Assumed sample cross-contamination rate of 0.001
Possible ploidies:
Autotetraploid (4)
logan8 <- readVcf("logan_8.vcf")
> logan8
class: CollapsedVCF
dim: 6862 272
rowRanges(vcf):
GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
info(vcf):
DataFrame with 12 columns: CNV, TA, TID, TGN, TCO, TACH, NS, MAF, AN, AFS, TYPE, FS
info(header(vcf)):
Number Type Description
CNV 1 Integer Number of samples with CNVs around this variant
TA 1 String Variant annotation based on a gene model
TID 1 String Id of the transcript related to the variant annotation
TGN 1 String Name of the gene related to the variant annotation
TCO 1 Float One based codon position of the start of the variant. The decimal is the codon position
TACH 1 String Description of the aminoacid change produced by a non-synonymous mutation. String encoded as reference am...
NS 1 Integer Number of samples genotyped
MAF 1 Float Minor allele frequency
AN 1 Integer Number of alleles in called genotypes
AFS R Integer Allele counts over the population for all alleles, including the reference
TYPE 1 String Type of variant
FS 1 Float Phred-scaled p-value using Fisher's exact test to detect strand bias
geno(vcf):
List of length 7: GT, PL, GQ, DP, ADP, BSDP, ACN
geno(header(vcf)):
Number Type Description
GT 1 String Genotype
PL G Integer Phred-scaled genotype likelihoods rounded to the closest integer
GQ 1 Integer Genotype quality
DP 1 Integer Read depth
ADP R Integer Counts for observed alleles, including the reference allele
BSDP 4 Integer Number of base calls (depth) for the 4 nucleotides in called SNVs sorted as A,C,G,T
ACN R Integer Predicted copy number of each allele taking into account the prediction of number of copies of the region...
Best,
Cesar
lvclark commented
Phasing and/or filtering would be the reason you get a different number of markers. If you need the RADdata object to match your VCF exactly, add the options:
phaseSNPs = FALSE, min.ind.with.reads = 0, min.ind.with.minor.allele = 0
It also looks like you should do al.depth.field = "ADP"
, not ACN.
cculma commented
Thank you Lindsay! It works fine.
Best,
Cesar