kgori/sigfit

build_catalogues uses character matrix, not data frame

Closed this issue · 4 comments

dg13 commented

Hi,
A minor point that would help first time users: in trying out extract_signatures I ran into problems because I initially created the variant count object (i.e. the structure named "variants_21breast" in the vignette example) as a dataframe not a character matrix. This seems to cause problems when then using build_catalogues to create the mutation matrix required by extract_signatures - namely the rownames are missing in the resulting output matrix. This happens even when you explicitly set the columns as characters when creating the data frame or set stringsAsFactors to FALSE (to avoid the usual auto conversion to factors). e.g.
(newm1 = character matrix)
(newm2 = data.frame with values as characters)

mat1 <- build_catalogues(newm1)
mat1[,1:5]
ACA>AAA ACC>AAC ACG>AAG ACT>AAT CCA>CAA
HPSI0114i-bezi_1 66 17 7 27 42
HPSI0114i-bezi_3 46 13 3 26 30

mat2 <- build_catalogues(newm2)
mat2[,1:5]
ACA>AAA ACC>AAC ACG>AAG ACT>AAT CCA>CAA
[1,] 66 17 7 27 42
[2,] 46 13 3 26 30

To avoid this maybe just make it clearer in the vignette that the method will fail if the input to build_catalogues is not a character matrix. (Or alternatively modify build_catalogues to handle data frames). Either, not a big issue, but thought I would post here in case others run into the same problem.
D

kgori commented

Adrian, I think wrapping the input in a call to as.matrix somewhere at the beginning of build_catalogues should solve this. It would make a copy of the input data, so as long as this is not too much of a performance hit, I think we should take the easy fix.

Kevin

That's true, there seems to be no input conversion in build_catalogues.

I've fixed this already, so it works with both character matrices and data frames (it only failed when the columns in the data frame were factors instead of character vectors).