dpeerlab/seqc

Duplicate gene names in sparse count matrix

Opened this issue · 0 comments

Since some times multiple ENSEMBL IDs correspond to a single gene name, there can be columns with the same gene name in the sparse count matrix (ie. entries in _sparse_counts_genes.csv are not unique). Not sure how this is handled in the filtered dense matrix. Might be good to add some suffix to duplicated gene names matching different ENSEMBL IDs, something like WDFY4 (1), WDFY4 (2).