lvclark/polyRAD

Error running Hind/He script in R

PauloBaleeiro opened this issue · 10 comments

Hello Lindsay,

First of all thank you for your effort putting all these together.

I study a group of aquatics Eriocaulon, and just recently we found out we might have a mix of diploid and polyploid depending the population and one specific case where I might have one pop with different ploidies.

I have been reading your recent paper on Hind/He and I tried running the script. Unfortunately I have been unable to run it, as a constant error pops up. I've ran the following command and got the following error:

myRAD <- VCF2RADdata("populations.haps.vcf",

  •                  svparam = ScanVcfParam(fixed = "ALT", info = NA, geno = "AD",
    
  •                                         which = GRanges("06", IRanges(1, 937644))),
    
  •                  yieldSize = NA)
    

Error in ScanVcfParam(fixed = "ALT", info = NA, geno = "AD", which = GRanges("06", :
could not find function "ScanVcfParam"

I've looked at this page for answers but I couldn't work it out. https://support.bioconductor.org/p/9137213/

I'm doing a De novo analysis on Stacks program, and I understood I needed to use the haplotype.vcf file from Stacks' folder, after a De novo analysis is finished. If there is anything you could tell me or point me in the right direction, I very much appreciate it.

Thanks very much for your time.

Hi Paulo,

Just run library(VariantAnnotation) before running your script.

Unfortunately it is still not working. It might be the GRanges and IRanges. I'm using the zipfile cataloge.vcf file generated by Stacks, but I cannot work out where the problem is. I tried running without "which=...", but it didn't work either.

What error are you getting?

Hi Lindsay,

This is the beginning of script followed by error:
myRAD <- VCF2RADdata("catalog.alleles.tsv.gz",

  •                  svparam = ScanVcfParam(fixed = "ALT", info = NA, geno = "AD",
    
  •                                         which = GRanges("06", IRanges(1, 937644))),
    
  •                  yieldSize = NA)
    

Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'samples': [internal] _hts_rewind() failed

Hi Clark,

I'm a bit stuck in the "object" to be used. In one of your scripts you show the following path:
VCF2RADdata(file, phaseSNPs = TRUE, tagsize = 80, refgenome = NULL,
tol = 0.01, al.depth.field = "AD", min.ind.with.reads = 200,
min.ind.with.minor.allele = 10, possiblePloidies = list(2),
taxaPloidy = 2L, contamRate = 0.001,
samples = VariantAnnotation::samples(VariantAnnotation::scanVcfHeader(file)),
svparam = VariantAnnotation::ScanVcfParam(fixed = "ALT", info = NA,
geno = al.depth.field,
samples = samples),
yieldSize = 5000, expectedAlleles = 5e+05, expectedLoci = 1e+05,
maxLoci = NA)

I'm not sure which VCF files to use. Would it be "populations.snps.vcf? or "populations.haps.vcf"?
And what about the catalog.alleles.tsv?
I have tried many ways, but none worked.

Cheers

Hi Paulo,

That looks like the usage section of the VCF2RADdata documentation. The argument defaults are shown, and you don't have to type the arguments if you are okay with the defaults.

I haven't used Stacks in a long time. If your VCF looks like this:
https://github.com/galaxyproject/tools-iuc/blob/main/tools/stacks2/test-data/populations/populations.haps.vcf
it won't work because the AD field is not included. However one like this should work: https://github.com/galaxyproject/tools-iuc/blob/main/tools/stacks2/test-data/populations/populations.snps.vcf

Your code (in its simplest version) would look like:

mydata <- VCF2RADdata("populations.snps.vcf")

You don't need catalog.alleles.tsv if you are using VCF2RADdata. That file is only needed for readStacks.

Best,

Lindsay

Screenshot 2023-07-11 at 1 06 59 pm This is mine. I'll try once again. Thanks

Should be okay. I was concerned about the missing AD field for missing genotypes (./.) but did a quick test and polyRAD handles those appropriately. If you continue to have issues, please provide your code and a small dataset so that I can reproduce the problem on my own machine.

That being said, you will have to lower the values for the min.ind.with.reads and min.ind.with.minor.allele arguments to VCF2RADdata since you have so few samples. polyRAD's genotyping algorithms may perform poorly on a sample size this small but you can use the naïve protocol shown in https://lvclark.r-universe.dev/articles/polyRADtutorials/population_genetics.html (just search the page for mydataNaive and follow that protocol for genotyping).