thierrygosselin/grur

Error in seq.default with missing_visualization

BrennaF opened this issue ยท 6 comments

Hi Thierry,

I'm running missing_visualization on these data sets:

dat1
/// GENIND OBJECT /////////

// 264 individuals; 12,091 loci; 24,182 alleles; size: 30 Mb

// Basic content
@tab: 264 x 24182 matrix of allele counts
@loc.n.all: number of alleles per locus (range: 2-2)
@loc.fac: locus factor for the 24182 columns of @tab
@all.names: list of allele names for each locus
@ploidy: ploidy of each individual (range: 2-2)
@type: codom
@call: .local(x = x, i = i, j = j, loc = ..1, drop = drop)

// Optional content
@pop: population of each individual (group size range: 21-31)

head(strata)
INDIVIDUALS STRATA library
1 GM1 GM Pw1
2 GM26 GM Pw1
3 GM40 GM Pw1
4 GM31 GM Pw1
5 GM15 GM Pw1
6 GM24 GM Pw1

My strata file has multiple "STRATA" (populations) and libraries (>10 for each).

Here is my call & output:

miss.dat1 <- missing_visualization(dat1, strata=strata)
#######################################################################
#################### grur::missing_visualization ######################
#######################################################################
Folder created:
missing_visualization_20180424@1454

Importing data
Alleles names for each markers will be converted to factors and padded with 0
Scanning for monomorphic markers...
Number of markers before = 12091
Number of monomorphic markers removed = 0

Tidy genomic data:
Number of markers: 12091
Number of chromosome/contig/scaffold: no chromosome info
Number of individuals: 264
Number of populations: 1

Informations:
Number of populations: 1
Number of individuals: 264
Number of ind/pop:
NA

Number of duplicate id: 0
Number of SNPs: 12091

Proportion of missing genotypes (overall): 0.298188

Identity-by-missingness (IBM) analysis using
Principal Coordinate Analysis (PCoA)...
Generating Identity by missingness plot
Error in seq.default(h[1], h[2], length.out = n) :
'to' must be a finite number
In addition: There were 42 warnings (use warnings() to see them)

Any ideas what might be causing the seq.default error?
Thanks!
Brenna

Hi Brenna,

  • Are you sure you have more than 1 value in the STRATA column ?
  • Because in the output is says number of populations : 1
  • The strata column might to be filled with GM all the way, or with just 1 individual with a different value, that would explain the error.
  • I'll raise an error earlier in the script when this is detected.
  • So the problem is that I always envision the function for population genomics and never intended the function to work with just 1 large grouping, I could if it's of interest.

Also, if you want to visualize missingness of the library column
use strata.select = c("POP_ID", "library") as argument in the function.
This will run the function on both columns.

Best
Thierry

Hi Thierry!

Yes, that is a little odd. I definitely have multiple groups in both columns of my strata file:

> table(strata$STRATA)
BC FT GM IM PM SM TT UH UL WT 
24 25 31 30 25 23 21 30 26 29 

> table(strata$library)
 Pc4  Pw1 Pw10  Pw2 Pw20  Pw5  Pw6  Pw7  Pw8  Pw9 
   2   31   30   31   23   30   29   27   32   29 

My input genind file (dat1) also has populations defined:

> table(dat1@pop)
BC FT GM IM PM SM TT UH UL WT 
24 25 31 30 25 23 21 30 26 29 

The strata file has $STRATA and $library as factors. Is that a problem? Not sure why grur would be reading all of that in as one population.

Brenna

Can you send me the data (.RData) by email ?

Hi Brenna, ok now I see... it's not the same individuals in your strata and data.
The next radiator and grur release will generate an error when this is found.

re-open the issue if after fixing this, there's still something wrong with missing_visualization
Best
Thierry

Thanks Thierry - so the rownames in the dat@tab need to have a corresponding column in the strata file? I didn't realize that! I just reran it with matching individual names and it worked. Sorry to bother you with that, but glad it was an easy fix!

the strata object/file requires a minimum of 2 columns, check function doc ??grur::missing_visualization. You had a column named INDIVIDUALS in your strata object,
only it was not the same individuals as in dat@tab so there was no way to match the column STRATA and library with the data.
Cheers
Thierry