dpeerlab/Harmony

Ordering of the timepoints

Marius1311 opened this issue · 2 comments

In my AnnData object, I have a field adata.obs['day'], which is categorical, calling adata.obs['day'].cat.categories yields

Index(['0', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13',
       '15', '21', '36'],
      dtype='object')

So the values are strings in the right order. However, when I call Harmony using the scanpy interface, the timepoint connections are created using

    timepoints = adata.obs[tp].unique().tolist()
    timepoint_connections = pd.DataFrame(np.array([timepoints[:-1], timepoints[1:]]).T)

which permutes my timepoints to a random order. To keep the order, I need to change this to

    timepoints = list(adata.obs[tp].cat.categories)
    timepoint_connections = pd.DataFrame(np.array([timepoints[:-1], timepoints[1:]]).T)

It would be very important in my opinion to have some info in the docstring about the format that the time point annotation needs to have in order to results in the expected results. It would also be good to check the dtype of the passed .obs annotation and to create the timepoint connections accordingly, as this is really critical.

Hey @Marius1311
Thank you for reporting this issue. Will have this fixed in Scanpy. We will force data type as categorical for time points, and make the proper checkup and changes to the code, and docstring.

Great, thanks!