knausb/vcfR

vcfR not recognizing all chromosomes

Closed this issue · 2 comments

When I try to read in my filtered vcf file, it only recognizes 16 of the 18 chromosomes. I unzipped and checked with head(filtered.vcf) and it says it has 18. Any idea what the reason for this would be? Size related? Happy to provide any additional information.

> vcf <- read.vcfR("filtered.vcf.gz", verbose = FALSE)
> vcf
***** Object of Class vcfR *****
96 samples
16 CHROMs
365,842 variants
Object size: 464.2 Mb
0 percent missing data
*****        *****         *****
$ head -50 filtered.vcf

##contig=<ID=1,length=4109373>
##contig=<ID=2,length=3341473>
##contig=<ID=3,length=3226611>
##contig=<ID=4,length=2468882>
##contig=<ID=5,length=2959378>
##contig=<ID=6,length=2725906>
##contig=<ID=7,length=2652353>
##contig=<ID=8,length=2617329>
##contig=<ID=9,length=2547566>
##contig=<ID=10,length=2419276>
##contig=<ID=11,length=2359939>
##contig=<ID=12,length=2352958>
##contig=<ID=13,length=2257609>
##contig=<ID=14,length=2138025>
##contig=<ID=15,length=2027721>
##contig=<ID=16,length=1969743>
##contig=<ID=17,length=247158>
##contig=<ID=18,length=208766>

If I ignore and create a chrom object:

chrom <- create.chromR(name="Supercontig", vcf=vcf, seq=dna, ann=gff, verbose=TRUE)
vcfR object includes more than one chromosome (CHROM).
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
Subsetting to the first chromosome
DNAbin object includes more than one chromosome.
1 dna:chromosome chromosome:ASM83294v1:1:1:4109373:1 REF, 2 dna:chromosome chromosome:ASM83294v1:2:1:3341473:1 REF, 3 dna:chromosome chromosome:ASM83294v1:3:1:3226611:1 REF, 4 dna:chromosome chromosome:ASM83294v1:4:1:2468882:1 REF, 5 dna:chromosome chromosome:ASM83294v1:5:1:2959378:1 REF, 6 dna:chromosome chromosome:ASM83294v1:6:1:2725906:1 REF, 7 dna:chromosome chromosome:ASM83294v1:7:1:2652353:1 REF, 8 dna:chromosome chromosome:ASM83294v1:8:1:2617329:1 REF, 9 dna:chromosome chromosome:ASM83294v1:9:1:2547566:1 REF, 10 dna:chromosome chromosome:ASM83294v1:10:1:2419276:1 REF, 11 dna:chromosome chromosome:ASM83294v1:11:1:2359939:1 REF, 12 dna:chromosome chromosome:ASM83294v1:12:1:2352958:1 REF, 13 dna:chromosome chromosome:ASM83294v1:13:1:2257609:1 REF, 14 dna:chromosome chromosome:ASM83294v1:14:1:2138025:1 REF, 15 dna:chromosome chromosome:ASM83294v1:15:1:2027721:1 REF, 16 dna:chromosome chromosome:ASM83294v1:16:1:1969743:1 REF, 17 dna:chromosome chromosome:ASM83294v1:17:1:247158:1 REF, 18 dna:chromosome chromosome:ASM83294v1:18:1:208766:1 REF
Subsetting to the first chromosome
Annotations include more than one chromosome.
1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 2, 3, 4, 5, 6, 7, 8, 9
Subsetting to the first chromosome
Names in vcf:
  1
Names of sequences:
  1 dna:chromosome chromosome:ASM83294v1:1:1:4109373:1 REF
Names in annotation:
  1
Initializing var.info slot.
var.info slot initialized.
Warning message:
In create.chromR(name = "Supercontig", vcf = vcf, seq = dna, ann = gff,  : 
        Names in variant data and sequence data do not match perfectly.
        If you choose to proceed, we'll do our best to match the data.
        But prepare yourself for unexpected results.

Is it possible that you do not have variants for some of your chromosomes? If your reference had 18 chromosomes you should see that in the meta region. But that does not mean you have variants for all of those chromosomes. If no variants were called for two of your chromosomes, or your filtering removed them, you'll only have variants for 16 chromosomes. Something like the following may help.

library(vcfR)
#> 
#>    *****       ***   vcfR   ***       *****
#>    This is vcfR 1.12.0.9999 
#>      browseVignettes('vcfR') # Documentation
#>      citation('vcfR') # Citation
#>    *****       *****      *****       *****
data("vcfR_test")

vcfR_test@meta <- c(
  vcfR_test@meta[1:5],
  c("##contig=<ID=1,length=4109373>",
  "##contig=<ID=2,length=3341473>",
  "##contig=<ID=3,length=3226611>",
  "##contig=<ID=4,length=2468882>",
  "##contig=<ID=5,length=2959378>",
  "##contig=<ID=6,length=2725906>",
  "##contig=<ID=7,length=2652353>",
  "##contig=<ID=8,length=2617329>",
  "##contig=<ID=9,length=2547566>",
  "##contig=<ID=10,length=2419276>",
  "##contig=<ID=11,length=2359939>",
  "##contig=<ID=12,length=2352958>",
  "##contig=<ID=13,length=2257609>",
  "##contig=<ID=14,length=2138025>",
  "##contig=<ID=15,length=2027721>",
  "##contig=<ID=16,length=1969743>",
  "##contig=<ID=17,length=247158>",
  "##contig=<ID=18,length=208766>"),
  vcfR_test@meta[6:18]
  )

# Your 18 plus one.
length(grep("contig", vcfR_test@meta))
#> [1] 19
length(unique(getCHROM(vcfR_test)))
#> [1] 1

Created on 2022-03-08 by the reprex package (v2.0.1)

Does this address your issue?

I should have checked the meta region, thank you! That was exactly it.