The dataset used in the regression model

Question

The dataset used in the regression model

zguo235 opened this issue 3 years ago · 3 comments

Hello,

I checked the dataset used in the regression model. It seems that simply dropping duplicate TCR won't get the dataset used in the regression model. Could you tell you where I can find the preprocessing detail to obtain a dataset for the regression model?

Thanks!

Answer 1 · 2021-12-09T05:12:01.000Z

Scripts to train regression models can be found under ancillary_analysis/supervised/supervised_reg/ under the following files. mart1_train.py, flu_train.py, ebv_train.py.

The csv file under Data/10x_Data/Data_Regression.csv already has no duplicates when looking at alpha/beta pairs.

Answer 2 · 2021-12-09T05:31:13.000Z

Thank you for your prompt response. I have an in-house dataset and I want to train the regression model using my dataset. My dataset is like the counting matrix in the original 10x dataset, that each row is the UMI counts for one cell. I checked ancillary_analysis/supervised/supervised_reg/*_train.py files, but there is no description about the data preprocessing. How should I clean my dataset to get a file like Data/10x_Data/Data_Regression.csv to train the regression model?

Answer 3 · 2021-12-09T09:35:07.000Z

Unfortunately, I am not able at this time to find the scripts I wrote to convert the 10x outputs to that csv file. But it should be rather simple to do with basic pandas functions.