thierrygosselin/stackr

vcf2dadi

rebzzy opened this issue · 3 comments

Hi -

I'm having trouble using the vcf2dadi() function. First, the example code gives me the error: "Error in as_data_frame(.) : Not a graph object" when I run like bit of code assigned to id.vcf.

Second, it's not clear how to generate the files needed for assigning an outgroup. Do I need to run stacks populations on both ingroup and the outgroup separately to generate a fasta + sumstats file for each? I've tried various renditions of this but am being told there are 0 common markers between my in- and out- group.

Thanks for your help!

Hi Rebecca!

  1. What's the stackr version you're using ?

If it's the latest (v.0.3.0) I see a lot of argument missing. There is definitely a bug.

I've been transferring all my vcf2... function to a global function called genomic_converter. However, vcf2dadi didn't make it, as I've been trying to modify this function in the past week to make it more ... simple and also independent of STACKS for people using other pipelines.

Most people that tested the function didn't use an outgroup so it might help me to know more the best way for others to use it...

  1. what's your current setup?
  • outgroup/ingroup: yes
  • using stacks ? it's important as stacks vcf file doesn't have the position of the snp in the read, you need the sumstats file for this info...
  • If using stacks, do you have a shared catalog between your outgroup and ingroup?

Best
Thierry

Hi -

I'm using stackr version 0.2.9.2. I'm using the Stacks VCF output, am importing the sumstats file, and I do have a shared catalog between my ingroup and outgroup. To get the VCF and sumstats files, I am running the Stacks populations script from Stacks to generate my output files.

Thank you for your help and quick reply!

Ok so try version stackr v.3.0.1 you should be able to run the example without an outgroup in the vignette download the html it's easier to read...

As for a run with outgroup, I'll modify the current way of doing it that requires 2 separate runs of stacks to get the fasta and susmstats... complicated for most users.

The next version will be ready tomorrow or Saturday.
I see 2 ways to do it:

  1. If the path to the catalog and sumstats is given, the function will use that to generate the info needed. No more additional stacks run and way faster.

  2. for non-stacks users: supply a data frame with 3 columns, MARKERS, SNP_POSITION on the read and SEQUENCE

Got a better idea ?

Best
Thierry