Bioconductor/GenomicRanges

problems with AtomicList in mcols in Rdevel

Closed this issue · 3 comments

plger commented

Happens only with R-devel:

> library(GenomicRanges)
> library(S4Vectors)
> library(IRanges)
> gr <- GRanges("chr1", IRanges(1:5, width=10))
> fl <- FactorList(lapply(1:5, FUN=function(x) sample(LETTERS,x)))
> fl
FactorList of length 5
[[1]] W
[[2]] P Q
[[3]] B V Y
[[4]] V M N Y
[[5]] T E K B O
> gr$fl <- fl
> gr
GRanges object with 5 ranges and 1 metadata column:
      seqnames    ranges strand |           fl
         <Rle> <IRanges>  <Rle> | <FactorList>
  [1]     chr1      1-10      * |             
  [2]     chr1      2-11      * |             
  [3]     chr1      3-12      * |             
  [4]     chr1      4-13      * |             
  [5]     chr1      5-14      * |             
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> gr$fl
FactorList of length 5
Error in RangeNSBS(x, start = start, end = end, width = width) : 
  the specified range is out-of-bounds

This works fine:

DataFrame(fl=fl)
DataFrame with 5 rows and 1 column
            fl
  <FactorList>
1            X
2          I,E
3        T,F,R
4    I,V,J,...
5    Y,N,F,...

This also works:

> fl <- FactorList(lapply(1:5, FUN=function(x) sample(LETTERS,x)), compress=FALSE)
> mcols(gr) <- NULL
> gr$fl <- fl
> gr
GRanges object with 5 ranges and 1 metadata column:
      seqnames    ranges strand |           fl
         <Rle> <IRanges>  <Rle> | <FactorList>
  [1]     chr1      1-10      * |            M
  [2]     chr1      2-11      * |          G,K
  [3]     chr1      3-12      * |        B,Y,Z
  [4]     chr1      4-13      * |    G,A,L,...
  [5]     chr1      5-14      * |    C,O,X,...

Suggesting that it is related to compression. However, further down the line it seems it goes back to compressing it automatically, and I get errors like:

Error in validObject(result) : 
  invalid class "CompressedFactorList" object: 
    improper partitioning
> sessionInfo()
R Under development (unstable) (2021-04-08 r80148)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /home/pigerm/applications/R-devel/lib/libRblas.so
LAPACK: /home/pigerm/applications/R-devel/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_CH.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_CH.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_CH.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_CH.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] GenomicRanges_1.43.4 GenomeInfoDb_1.27.10 IRanges_2.25.7      
[4] S4Vectors_0.29.15    BiocGenerics_0.37.1 

loaded via a namespace (and not attached):
[1] zlibbioc_1.37.0        compiler_4.1.0         tools_4.1.0           
[4] XVector_0.31.1         GenomeInfoDbData_1.2.4 RCurl_1.98-1.3        
[7] bitops_1.0-6          

plger commented

Bug was reproduced (by csoneson) on the following setup:

R Under development (unstable) (2021-03-29 r80130)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     
other attached packages:
[1] GenomicRanges_1.43.4 GenomeInfoDb_1.27.11 IRanges_2.25.7       S4Vectors_0.29.15   
[5] BiocGenerics_0.37.1 
loaded via a namespace (and not attached):
[1] zlibbioc_1.37.0        compiler_4.1.0         XVector_0.31.1        
[4] tools_4.1.0            GenomeInfoDbData_1.2.4 RCurl_1.98-1.3        
[7] yaml_2.2.1             bitops_1.0-6          

Instead, there is no bug on this one:

R Under development (unstable) (2021-04-05 r80145)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS
Matrix products: default
BLAS:   /home/stephany/r-devel/R-devel/lib/libRblas.so
LAPACK: /home/stephany/r-devel/R-devel/lib/libRlapack.so
locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_CH.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_CH.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_CH.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_CH.UTF-8 LC_IDENTIFICATION=C       
attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     
other attached packages:
[1] GenomicRanges_1.43.4 GenomeInfoDb_1.27.8  IRanges_2.25.6      
[4] S4Vectors_0.29.12    BiocGenerics_0.37.1 
loaded via a namespace (and not attached):
[1] zlibbioc_1.37.0        compiler_4.1.0         tools_4.1.0           
[4] XVector_0.31.1         GenomeInfoDbData_1.2.4 RCurl_1.98-1.3        
[7] bitops_1.0-6

Which suggests that it's not strictly GenomicRanges-related, but perhaps S4Vectors?

Thanks @plger for the report and sorry for the delay. We're going to take a look at this ASAP.

Fixed in BiocGenerics 0.37.5. See Bioconductor/IRanges#38 for the details.

H.