Error in FUN(X[[i]], ...) : subscript out of bounds

Question

Error in FUN(X[[i]], ...) : subscript out of bounds

ibulanov opened this issue 4 years ago · 5 comments

ibulanov commented 4 years ago

Hi @andreaskapou ,

I have my own data at the same format as melissa synthetic data and when I use melissa function, I get:

Error in FUN(X[[i]], ...) : subscript out of bounds

My input data looks like:

and

and every cell contains the specific genomic region:

The output matrix for every region is:

Do you have ideas why I get such error? I tried to debug it, unsuccessfully:

Thank you in advance!

Best,
Igor

Answer 1 · 2020-08-03T09:02:31.000Z

I found out that it is necessary to have the equal amount of genomic regions across the cells. But what if I have the different amount of regions across the cells? Should I just cutt off extra regions (in comparison to minimum number of region of some cell)?

Answer 2 · 2020-08-03T11:12:45.000Z

Hi Igor,

Yes this error makes sense, since internally the function only checks the total number of regions from the 1st cell, and uses this as the total number of features M. I am not sure why you would want to have a different number of genomic regions across cells, though.

For simplicity, you can think of the genomic regions as columns (features) of a matrix, so in addition to all rows (cells) having the same number of features also the order should be the same. I am not sure how you created the object you feed in to melissa, but the create_melissa_data_obj function will try and retain this structure (where columns of matrix correspond to elements of the list).

As I mentioned above you cannot have differing number of regions across cells (this would imply having different num of columns in a matrix for different rows). However, instead of keeping the minimum number, you could add the maximum number of genomic regions you are interested, and for the cells you are not interested (or do not have enough data) you can put NA to the corresponding elements of the list (think of it as missing values in the matrix). This way, that genomic region for that specific cell will not contribute its likelihood when performing inference, and Melissa will internally skip it.

Essentially this is what the create_melissa_data_obj function does for regions with low coverage. (The corresponding function is in the BPRMeth package: https://github.com/andreaskapou/BPRMeth/blob/master/R/process_data.R#L315)

Hope this makes sense, but do let me know if I need to elaborate.

Best,
Andreas

Answer 3 · 2020-08-03T13:24:57.000Z

Thank you for the reply!

Now I use create_melissa_data_obj function:

input_melissa_do <- create_melissa_data_obj( met_dir = "/path/to/dir/", anno_file = "/path/to/file", chrom_size_file = NULL, chr_discarded = NULL, is_centre = TRUE, is_window = FALSE, upstream = -5000, downstream = 5000, cov = 5, sd_thresh = -1, no_cores = 5 )

and the equal amount of genomic regions. It looks like:

and

But when I use melissa function I get the error:

Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)

Sorry if it's too much information. Maybe I have to read the specific info in the manual / publication?

Best,
Igor

Answer 4 · 2020-08-04T07:53:39.000Z

Hi Igor,

I haven't seen this error before when running Melissa. Is it easy to send me a subset of the object (i.e. taking first 100 regions and 50 cells) and send it to me so I could have a look on the created object? My email is: kapouranis.andreas at gmail.com

Also could you print the input_melissa_do$met$cell_100310.tsv the 9th and 12th regions? There seems to be too many (relative) CpG locations at 0, which shouldn't be the case...

Best,
Andreas

Answer 5 · 2020-08-05T07:57:35.000Z

The last error
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
was due to not using the binarised data methylation data, i.e. not calling the binarise_files function.

Closing the issue.