AntonioDeFalco/SCEVAN

Remove filtering of duplicate gene symbols when rownames of counts matrix are Ensembl IDs

allyhawkins opened this issue · 1 comments

I was coming across an error when trying to run this using an object that had Ensembl IDs as the row names rather than gene symbols. During the annotateGenes() function, I was getting the following error:

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 7104, 7107

I narrowed this down to this line, which removes any genes that have a duplicated gene symbol in the reference edb matrix. However, you don't do the same thing to the mtx variable.

edb <- edb[!duplicated(edb$gene_name),]

This is probably necessary with gene symbols since the dimensions between edb and mtx may not match if duplicated values are present in edb. However, if using Ensembl IDs there are no duplicated IDs, so this step isn't necessary. Also you should only remove duplicated for IDs for the column indicated with use_geneID, although I think if it's gene_id, then I would skip this step all together.

Thanks @allyhawkins,
I fixed it in the last commit f1394b3.

Regards