Error when Using ROC test for Marker Genes
Opened this issue · 1 comments
Hi @romanhaa !
Thanks again for the great app! It's truly useful.
I'm having a issue when trying to use ROC test to get marker genes:
Error: Can't subset columns that don't exist.
x The column `p_val` doesn't exist.
Run `rlang::last_error()` to see where the error occurred.
WIth rlang::last_error():
<error/vctrs_error_subscript_oob>
Can't subset columns that don't exist.
x The column `p_val` doesn't exist.
Backtrace:
1. cerebroApp::getMarkerGenes(...)
32. vctrs:::stop_subscript_oob(...)
33. vctrs:::stop_subscript(...)
Run `rlang::last_trace()` to see the full context.
With rlang::last_trace():
1. ├─cerebroApp::getMarkerGenes(...)
2. │ └─`%>%`(...)
3. │ ├─base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
4. │ └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
5. │ └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
6. │ └─cerebroApp:::`_fseq`(`_lhs`)
7. │ └─magrittr::freduce(value, `_function_list`)
8. │ └─function_list[[i]](value)
9. │ ├─dplyr::select(...)
10. │ └─dplyr:::select.data.frame(...)
11. │ └─tidyselect::vars_select(tbl_vars(.data), !!!enquos(...))
12. │ └─tidyselect:::eval_select_impl(...)
13. │ ├─tidyselect:::with_subscript_errors(...)
14. │ │ ├─base::tryCatch(...)
15. │ │ │ └─base:::tryCatchList(expr, classes, parentenv, handlers)
16. │ │ │ └─base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
17. │ │ │ └─base:::doTryCatch(return(expr), name, parentenv, handler)
18. │ │ └─tidyselect:::instrument_base_errors(expr)
19. │ │ └─base::withCallingHandlers(...)
20. │ └─tidyselect:::vars_select_eval(...)
21. │ └─tidyselect:::walk_data_tree(expr, data_mask, context_mask)
22. │ └─tidyselect:::eval_c(expr, data_mask, context_mask)
23. │ └─tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
24. │ └─tidyselect:::walk_data_tree(new, data_mask, context_mask)
25. │ └─tidyselect:::eval_c(expr, data_mask, context_mask)
26. │ └─tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
27. │ └─tidyselect:::walk_data_tree(new, data_mask, context_mask)
28. │ └─tidyselect:::as_indices_sel_impl(...)
29. │ └─tidyselect:::as_indices_impl(x, vars, strict = strict)
30. │ └─tidyselect:::chr_as_locations(x, vars)
31. │ └─vctrs::vec_as_location(x, n = length(vars), names = vars)
32. └─vctrs:::stop_subscript_oob(...)
33. └─vctrs:::stop_subscript(...)
I think this arises as new names are given for the p_value column, and getMarkerGenes
still tries to make a table with the original names Seurat outputs when using Wilcox.
I want to use the ROC test because I'm actually looking for specific and exclusive cell markers, instead of differentially expressed genes, and ROC performs better for that purpose when compared to Wilcox. This would really make a difference in our analysis.
Any further insights? How can we fix this?
Hi @davisidarta!
You are bringing up another (current) limitation of Cerebro :) And you suspect correctly that a very rigid structure for tables (e.g. marker genes) is expected. I have been thinking about a generalised structure for these types of results for a while, and in fact also sent out an email to the Bioconductor developer mailing list to get their feedback. It looks like I'll have to implement my own method for processing the output of different DGE tools for visualization in Cerebro. Not sure when I'll be able to put it into releasable form though.
A short term workaround would be to rename columns so that they meet the current format (and add fake columns if there aren't enough. At the moment, the format looks like this:
# A tibble: 1,181 x 8
cluster gene p_val avg_logFC pct.1 pct.2 p_val_adj on_cell_surface
<fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
1 1 S100A12 1.22e-88 2.87 0.993 0.106 1.34e-84 FALSE
2 1 AC020656.1 2.04e-86 1.79 0.961 0.075 2.24e-82 FALSE
3 1 CD14 1.25e-85 1.94 0.948 0.06 1.37e-81 TRUE
4 1 VCAN 5.27e-85 2.47 0.987 0.121 5.77e-81 FALSE
5 1 FCN1 8.92e-83 2.27 1 0.161 9.77e-79 FALSE
6 1 CSTA 2.05e-82 1.64 0.974 0.089 2.25e-78 FALSE
7 1 MS4A6A 4.16e-82 1.86 0.954 0.089 4.56e-78 FALSE
8 1 MNDA 2.31e-81 2.15 1 0.19 2.53e-77 FALSE
9 1 FGL2 6.20e-76 1.69 0.974 0.141 6.78e-72 FALSE
10 1 CSF3R 2.87e-74 1.38 0.908 0.095 3.15e-70 FALSE
# … with 1,171 more rows
The on_cell_surface
column is optional.
Do you think that could work?
Sorry that I don't have an easier solution this time.
Best,
Roman