Error in CellTagMatrixCount when generating the count matrix for v2

Question

Error in CellTagMatrixCount when generating the count matrix for v2

Closed this issue 3 years ago · 19 comments

Hi, I am getting this error:

Error in intI(j, n = x@Dim[2], dn[[2]], give.dn = FALSE) :
invalid character indexing
Calls: CellTagMatrixCount ... callGeneric -> eval -> eval -> [ -> [ -> subCsp_cols -> intI

In addition to the expected warning message at this stage.

As far as I can tell it appears to be related to the sparseMatrix function or perhaps SetCellTagCurrentVersionWorkingMatrix, although these steps work when isolated from CellTagMatrixCount.

The function works if I start from a new bam.test.obj rather than the one generated from v1.

Can you help?

Thanks!!

Answer 1 · 2021-11-10T16:58:13.000Z

Hello,

Sorry for the error! To get a little more information, is this running with the bam.test.obj generated from v1 to get another matrix for the other celltag, such as v2? And did everything run through okay at the CellTagExtraction function with v2 and the generated object?

Let us know!
Best,
Wenjun

Answer 2 · 2021-11-10T17:57:46.000Z

Hi Wenjun, Thanks for your quick reply!

Yes - this is run with the bam.test.obj generated from v1 after the clone calling step. I am then running CellTagExtraction for v2, followed by CellTagMatrixCount as follows:

bam.test.obj <- readRDS("bam.test.obj.complete_v1.rds")
bam.test.obj <- CellTagExtraction(bam.test.obj, celltag.version = "v2")
bam.test.obj <- CellTagMatrixCount(celltag.obj = bam.test.obj, barcodes.file = "./barcodes_all.tsv")

The extraction function appears to work fine (although after extraction bam.test.obj@curr.version is still "v1" - not sure if it should be). That said, the bam.test.obj is too large to load locally (I have 17 samples) and so I am submitting to our cluster and may have missed something.

CellTagMatrixCount also worked fine when running v1.

Thanks so much for your help!

Answer 3 · 2021-11-10T18:38:38.000Z

Hi,

The curr.version should be changed to v2 after running the extraction. That might have caused some conflict in setting the matrix with v2 prefix. I think it might have had something going on with setting that and then it was trying to overwrite the matrix counts already generated. Could I have you check the following parameters for me in the object whenever you get a chance?

Check bam.test.obj@celltag.version and bam.test.obj@curr.version after you load the bam.test.obj from the RDS file (maybe load into another variable to not overwrite the one with v2 extracted).
Check bam.test.obj@celltag.version after the v2 extraction. It should have both "v1" and "v2"
Check head(bam.test.obj@bam.parse.rslt[["v2"]]) after the v2 extraction. It should have a data frame.

Hopefully this can help us pinpoint where is going off track!
Best,
Wenjun

Answer 4 · 2021-11-10T20:02:36.000Z

Hi Wenjun,

I thought that might be the problem, but I could not work out how to address it.

After loading bam.test.obj from the RDS file: bam.test.obj@celltag.version = "v1" and bam.test.obj@curr.version = "v1"
After the v2 extraction: bam.test.obj@celltag.version = "v1" "v2"
After the v2 extraction: head(bam.test.obj@bam.parse.rslt[["v2"]]) =

Cell.BC UMI Cell.Tag
1: Sample-1_NA AACTAAAACATA GTCACCTA
2: Sample-1_NA ATTTCTAGTGTT GGGGACCA
3: Sample-1_NA ATTTCTAGTGTT GGGGACCA
4: Sample-1_NA CAGTAGGCAATT TCATAACA
5: Sample-1_NA TTCTCAGTTCAT ATCTACTA
6: Sample-2_NA AAAACATATATT TAATTTTA

Note: not all of the Cell.BC column entries end with "NA"

Does this help?

Thanks

Answer 5 · 2021-11-10T21:19:16.000Z

Thanks for these outputs!

Seems like the extraction went okay! Are the NAs coming from the barcodes? I think it might be the version problem. I am running the pipeline a bit right now to see what happened. I will update shortly if there is a change. But in the meanwhile, you could try manually setting the curr.version to v2 by bam.test.obj@curr.version <- "v2" before the matrix counts and see if that helps!

Best,
Wenjun

Answer 6 · 2021-11-11T03:21:44.000Z

Hi Wenjun,

I gave it a try but unfortunately setting the curr.version to v2 in the way you describe does not fix the error. I double checked and actually I was wrong - bam.test.obj@curr.version is "v2" after extraction.

I think it is something to do with SetCellTagCurrentVersionWorkingMatrix. I had an earlier error that looked like this:

Error in GetCellTagCurrentVersionWorkingMatrix(celltag.obj, "raw.count") :
could not find function "GetCellTagCurrentVersionWorkingMatrix"
Calls: CellTagDataForCollapsing

I fixed this by manually loading in the GetCellTagCurrentVersionWorkingMatrix function from the list of Auxiliary Functions found here: https://github.com/morris-lab/CellTagR/blob/39db641cb5a28fffe0138f434acbb03c92f48f12/R/AuxiliaryFunctions.R

Could this be the cause of the problem? Perhaps these are out of date? Not sure why I got the initial "could not find function" error.

Thanks again for your help with this.

(P.S. No, the barcodes file does not have any NAs in it. )

Answer 7 · 2021-11-11T05:49:28.000Z

Hi,

Thanks for the feedback! This is helpful! And no problem at all.

I was running through the code and it seemed to work okay. This could be an issue during the update that the function was not correctly loaded. Maybe you could have installed an older version? If you don't mind trying uninstall the package and reinstall, that might help. With the reinstallation, you should still be able to use the same object.

Let us know if that works!
Wenjun

Answer 8 · 2021-11-11T15:52:44.000Z

Hi Wenjun,

I tried reinstalling and still got the intI error.

When was the update? I am currently running this with R version 4.0.4. Could this be the problem?

Thanks

Answer 9 · 2021-11-11T16:38:45.000Z

Hi,

I found the bug finally! It is a part in the setter to merge the different CellTag matrix. It should be fixed now. Please reinstall and try again and hopefully that will resolve the issue!

Thanks for your patience!
Best,
Wenjun

Answer 10 · 2021-11-11T18:53:16.000Z

Hi Wenjun,

It worked! Thank you so much for finding it! Would you mind sharing the fix with me? I have been looking through this a lot and am curious to see what the problem was.

I also thought you might like to know about another bug I found, but was able to fix myself. In both CellTagDataForCollapsing and CellTagDataPostCollapsing the function "startsWith" is used to subset the data (according to "sample-1", "sample-2" etc.) from the column containing the concatenated sample name/barcode. The problem is that "startsWith" erroneously recognises any samples between 10-19 as "Sample-1" (and would label those 20-29 as sample-2 etc.). To fix this I instead used "endsWith" after splitting the sample name away from the concatenated barcode using "strsplit".

I hope that is helpful! Thanks again for all of your help with this. You have been great!

James

Answer 11 · 2021-11-11T22:44:56.000Z

Hi James,

Glad it worked out! The fix was in the last couple of lines in the setter when merging the matrices for the two different versions of the CellTag. It should be merging by row and instead it was incorrectly merging columns. This was probably introduced when I was trying to update for using sparse matrices.

Thank you for pointing it out for the sample numbers! We haven't tested on a situation with more than 10 samples. This is great feedback! I have just fixed it and it should be working properly now!

Thanks for using CellTagR! Hope all goes well! Let us know anytime if there is anything else we could help!

Best,
Wenjun

Answer 12 · 2021-11-12T15:19:51.000Z

Hi Wenjun,

Could the same bug be in GetCellTagCurrentVersionWorkingMatrix? I am now getting the following error when running the whitelist filtering ("SingleCellDataWhitelist" function)

Error in dimnamesGets(x, value) :
invalid dimnames given for “dgCMatrix” object
Calls: SingleCellDataWhitelist ... colnames<- -> dimnames<- -> dimnames<- -> dimnamesGets

Thanks

James

Answer 13 · 2021-11-12T16:19:47.000Z

Hi James,

I was looking at the Getter function and I don't think that would be the same issue since you probably have ran through the collapsing again with v2 (which also use the getter).

Just to double check -

Did the collapsing/binarization work fine for v2?
Could I maybe ask you to check colnames(bam.test.obj@binary.mtx)[c(1:10)] for me?
This might be a dumb question, but you are using the correct whitelist/allowlist for v2?

Wenjun

Answer 14 · 2021-11-12T16:39:54.000Z

Hi Wenjun,

I thought the collapsing/binarization worked worked for v2 because I have the collapsed result files generated, but the colnames for the binary.mtx slot are all "v1" as follows:

[1] "v1.AAAAAAAA" "v1.AAAAAAAG" "v1.AAAAAAAT" "v1.AAAAAACA" "v1.AAAAAACT"
[6] "v1.AAAAAAGA" "v1.AAAAAAGT" "v1.AAAAAATG" "v1.AAAAAATT" "v1.AAAAACAA"

Perhaps this means it failed? Note: v2 is not present in sample-1 so there is no collapsed result file for that sample. Could this have anything to do with it?

Yes I am using the whitelist generated for v2.

James

Answer 15 · 2021-11-12T17:12:51.000Z

James,

Yes, this seems like the collapsing or the binarization didn't work. The missing v2 in sample-1 shouldn't cause an issue since the setting function is adding the missing rows.

A few things we could check for the collapsing -

check str(bam.test.obj@pre.starcode[["v2"]]) to see if the pre.starcode data frame is okay
for the collapsing result files, I think the naming of the files need to be like _Sample-1.txt etc. (sorry we didn't specify in the tutorial!) And consider putting them in a different folder compare to v1 collapsing results.
then check colnames(bam.test.obj@collapsed.count)[1:10] (it should have v2 columns if the post collapsing function worked!)

Sorry for all the checks! Just want to pinpoint the malfunctioning functions!
Best,
Wenjun

Answer 16 · 2021-11-12T19:26:57.000Z

Hi Wenjun,

For str(bam.test.obj@pre.starcode[["v2"]]) I am getting "NULL"
For the collapsing results the name is collapsing_Sample-2.txt etc. I think this was actually specified! And they are in a different folder from v1. I think this was also suggested somewhere.
colnames(bam.test.obj@collapsed.count)[1:10] looks like this:

[1] "v1.AAAAAAAA" "v1.AAAAAAAG" "v1.AAAAAAAT" "v1.AAAAAACA" "v1.AAAAAACT"
[6] "v1.AAAAAAGA" "v1.AAAAAAGT" "v1.AAAAAATG" "v1.AAAAAATT" "v1.AAAAACAA"

I am currently regenerating the bam.test.obj before starcode and before CellTagDataPostCollapsing to check bam.test.obj@pre.starcode slot at this stage.

James

Answer 17 · 2021-11-13T00:11:12.000Z

Hi James,

Thanks for sharing! I think probably there are some issues in generating the collapsing files since the pre.starcode should contain the data frame for the collapsing. I am guessing the text files look okay? (For instance, the collapsing_Sample-2.txt has a sequence & a count for each row)

Let us know how it looks before v2 starcode collapsing generation. (Hopefully at least str(bam.test.obj@pre.starcode[["v1"]]) is not null!)

Best,
Wenjun

Answer 18 · 2021-11-15T15:30:00.000Z

Hi Wenjun,

For some reason the str() function is not working for me with this slot. When I used head(bam.test.obj@pre.starcode[["v2"]]) the data were there.

I have actually located the source of the error. It was introduced as part of my fix for the issue where sample-10 and higher were getting labelled as sample-1. I have it working now. I hope I did not waste too much of your time with this!

Thanks again for all of your help.

Best,
James

Answer 19 · 2021-11-15T16:45:18.000Z

James,

No problem at all! Glad you sorted it out! Good luck with the analysis!

All the best,
Wenjun