4i dataset
Closed this issue · 5 comments
After reading 4i data the preprocessed version provided
. I have not figured out the source/target distribution there.
I notice that the data are indexed by the drug and cell original, but no source/target labeled.
Please see attached screenshots.
Screenshot1: my code snippet
Screenshot2 Screenshot3: is the data obs and var, as you can see it is indexed by drug as row and cell original as column.
Screenshot4: is UMAP filtering the data by Trametinib but could not filter (source vs target)
I also found in the repository line 71 to line 93: https://github.com/bunnech/cellot/blob/main/cellot/data/cell.py
you where labeling the data as source and target, I am not sure how do you do that. I thought the data are already labeled.
I really appreciate any explanation.
Thank you
The source/target labels get adding in during the data loading of the model. Typically, the "source" corresponds to the control condition. You need to run the model in order to induce the pairing across conditions. We have more details on how to run the model in the repo's readme. Hope this helps!
Where do you find the 4i data?
You can find it in the paper repository page:https://github.com/bunnech/cellot
README file then, there is a section about Dataset: It says (You can download the preprocessed data ... )
There is a link that will take you to ploybox website and you can download it from there.
Very weird. From this link (https://polybox.ethz.ch/index.php/s/RAykIMfDl0qCJaM), I have obtained many other datasets but none of them named 4i. Maybe they have modified this file sometimes.
Here is a screenshot of the dataset I have obtained by unzipping the preprocessed data:
You can download everything from:
https://www.research-collection.ethz.ch/handle/20.500.11850/609681
From CELLOT paper: see screenshot, where it says (The processed datasets of all tasks can be accessed at)