error about prepare_split
Closed this issue · 4 comments
Not sure why you see that error. I just ran this piece of code and everything seemed to work fine
import sys
sys.path.append('../')
from gears import PertData
pert_data = PertData('./data') # specific saved folder
pert_data.load(data_name = 'adamson') # specific dataset name
pert_data.prepare_split(split = 'simulation', seed = 1) # get data split with seed
pert_data.get_dataloader(batch_size = 32, test_batch_size = 128) # prepare data loader
Thanks for your kindness and patience, I think I found the reason.
You may write the code for MacOS or Linux, but I run your code on a windows laptop. So, the output of the following code in your perdata.py line 166
data_path = os.path.join(self.data_path, data_name)
is './data\norman' instead of './data/norman'
thus,
self.dataset_name = data_path.split('/')[-1]
here is 'data\norman' instead of 'norman'.
I run this code
pert_data.dataset_name = 'norman'
before
pert_data.prepare_split(split = 'simulation', seed = 1)
pert_data.get_dataloader(batch_size = 32, test_batch_size = 128)
and fix the problem.
But I'm not sure whether there are other potential errors.
Also, I'm curious about the prediciton of GEARS. As you mentioned in issue#47, the output is [perturbation_categories, genes]. But in your code, correct me if i'm wrong, you add the new perturb on all unperturbed cells (self.ctrl_adata), then you take the average of all perturbed cells. Dose that means technically you could provide the single-cell resolution prediction? Could you please share with me why you give up the high resolution but use the average?
Yes, we haven't tested our code on Windows machines so there may be some unexpected behavior in how paths are defined.
Thanks for your question regarding single-cell prediction. We avoid making predictions at the single-cell level because the mapping between control cell and perturbed cells is random. We don't have ground truth information on how each cell responds to perturbation. Thus, both the metric computation and the model predictions are at the level of population averages.