romanhaa/cerebroApp

Using cerebro docker as a web interactive object on a VM

Closed this issue · 2 comments

Hi, @romanhaa !

I really enjoy Cerebro (I guess you might have figured that out by now). I'm working on SARS-Cov-2 research and I would like to add a link to the manuscript to allow users to interact with the Cerebro interface with the data.

I have a VM with Ubuntu 18.04 LTS installed, and I was able to get the docker image online and successfully upload the .crb file. However, when accessed from another computer, the 'upload file' field was still active, and thereby an end user wouldn't be able to access the data unless they had the .crb file (which, in this case, is quite large).

Do you have any clues on how to scale Cerebro to a web application on a VM?

Hi @davisidarta ! Good to hear from you :) Let me see if I understood correctly. What you would like to do is integrate the .crb file into the VM so that users just have to launch Cerebro from within the VM without having to load the data. Is that right? If that is is the case, then I actually had something like this in mind at the very beginning of the development, before it was even called Cerebro. I thought there could be two modes in which Cerebro exists. Let's call the first "boxed", where it already comes with a pre-loaded data set and skips the data loading screen. The other is "open" and except for the example data set is empty, so the user will have to load their own data. I never got around to implement it, but I guess it could be useful in some cases. An easy workaround would be to replace the example data set that is loaded when you launch Cerebro (you can find out where it's located with this command system.file("extdata/v1.2/example.rds", package = "cerebroApp")). There would still be the "Load data" screen, and one could still upload a data set, but it wouldn't be necessary and you could tell the users that they can simply ignore the "Load data" tab. The downside is that, depending on the size of your dataset, it might take a while to load it, potentially resulting in a white screen (because the Cerebro user interface is loaded after the data). This can be resolved by refreshing the interface after a while. It's not the most elegant solution but it's the best that I have for now.

Does this help in any way?

Hi @romanhaa

I followed your advice and puloaded our data to the VM and changed its name to the example data Cerebro uses. We, however, had to subsample it to 10,000 cells (out of 130,000) because of docker memory glitches. Particularly, the full Cerebro object is loaded into RAM (20GB for 130,000 cells), and this is repeated every time a new access is done to the webpage. By using 10,000 cells (2GB), we managed to make it reasonably scalable (it's able to sustain up to 20 simultaneous accesses and has a auto-cleaning script), although it still takes a while to load the dimensional reduction plots. Please take a look at the current version of the human lung integrated cell atlas. The pre-print report is currently under screening at medRxiv and submitted to a relevant journal.

I really appreciate your help with this and also your work with Cerebro. Although I think Cerebro is great ( I use it on a daily basis and several of my collaborators have it on their laptops), I also acknowledge that Cerebro has some limitations regarding scalability. I'm interested in addressing this issue, as well as extending it to AnnData objects. Let me know if I have your permission to do so and if you would be interested in working together at davisidarta@gmail.com :)