Error in FUN(X[[i]], ...) : subscript out of bounds
ibulanov opened this issue · 5 comments
Hi @andreaskapou ,
I have my own data at the same format as melissa synthetic data and when I use melissa function, I get:
Error in FUN(X[[i]], ...) : subscript out of bounds
My input data looks like:
and
and every cell contains the specific genomic region:
The output matrix for every region is:
Do you have ideas why I get such error? I tried to debug it, unsuccessfully:
Thank you in advance!
Best,
Igor
I found out that it is necessary to have the equal amount of genomic regions across the cells. But what if I have the different amount of regions across the cells? Should I just cutt off extra regions (in comparison to minimum number of region of some cell)?
Hi Igor,
Yes this error makes sense, since internally the function only checks the total number of regions from the 1st cell, and uses this as the total number of features M. I am not sure why you would want to have a different number of genomic regions across cells, though.
For simplicity, you can think of the genomic regions as columns (features) of a matrix, so in addition to all rows (cells) having the same number of features also the order should be the same. I am not sure how you created the object you feed in to melissa
, but the create_melissa_data_obj
function will try and retain this structure (where columns of matrix correspond to elements of the list).
As I mentioned above you cannot have differing number of regions across cells (this would imply having different num of columns in a matrix for different rows). However, instead of keeping the minimum number, you could add the maximum number of genomic regions you are interested, and for the cells you are not interested (or do not have enough data) you can put NA
to the corresponding elements of the list (think of it as missing values in the matrix). This way, that genomic region for that specific cell will not contribute its likelihood when performing inference, and Melissa will internally skip it.
Essentially this is what the create_melissa_data_obj
function does for regions with low coverage. (The corresponding function is in the BPRMeth package: https://github.com/andreaskapou/BPRMeth/blob/master/R/process_data.R#L315)
Hope this makes sense, but do let me know if I need to elaborate.
Best,
Andreas
Thank you for the reply!
Now I use create_melissa_data_obj
function:
input_melissa_do <- create_melissa_data_obj( met_dir = "/path/to/dir/", anno_file = "/path/to/file", chrom_size_file = NULL, chr_discarded = NULL, is_centre = TRUE, is_window = FALSE, upstream = -5000, downstream = 5000, cov = 5, sd_thresh = -1, no_cores = 5 )
and the equal amount of genomic regions. It looks like:
and
and
But when I use melissa
function I get the error:
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
Sorry if it's too much information. Maybe I have to read the specific info in the manual / publication?
Best,
Igor
Hi Igor,
I haven't seen this error before when running Melissa. Is it easy to send me a subset of the object (i.e. taking first 100 regions and 50 cells) and send it to me so I could have a look on the created object? My email is: kapouranis.andreas at gmail.com
Also could you print the input_melissa_do$met$cell_100310.tsv
the 9th and 12th regions? There seems to be too many (relative) CpG locations at 0, which shouldn't be the case...
Best,
Andreas
The last error
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
was due to not using the binarised data methylation data, i.e. not calling the binarise_files
function.
Closing the issue.