/CHO_N-glycosylation_prediction

Code and datasets for the publication "Linear and Neural Network Models for Predicting N-glycosylation in Chinese Hamster Ovary Cells Based on B4GALT Levels"

Primary LanguageJupyter NotebookMIT LicenseMIT

Linear and Neural Network Models for Predicting N-glycosylation in Chinese Hamster Ovary Cells Based on B4GALT Levels

These are the datasets and model files associated with the publication Linear and Neural Network Models for Predicting N-glycosylation in Chinese Hamster Ovary Cells Based on B4GALT Levels. This work uses linear models and ANNs to predict the distribution of glycans on potential N-glycosylation sites. The models were trained on data containing normalized CHO cell B4GALT levels.

Reproducing the models and plots

Download the datasets folder and run the SPA_glycosylation_model.py file without any flags (python SPA_glycosylation_model.py) to recreate the cross-validation results, and run with the --nested flag (python SPA_glycosylation_model.py --nested) to recreate the nested validation results. To recreate the ANN results, download the ANN_train.ipynb file, change the first cell as needed, and run the notebook.
To recreate the plots, download the result_csv_files and result_csv_files_nested folders, then run the make_results_plots.py file. Most plots will be generated in the first folder, but the nested validation PRE distribution will be generated in the nested results folder.

Using the models to predict glycan distributions

The Conda environment defining the specific packages and version numbers used in this work is available as ANN_environment.yaml. To use our trained models, run the ANN_predict.py file as python ANN_predict.py <location> <B4GALT levels>. For example, python ANN_predict.py Asn_24 1 1 1 1 to predict the wild-type glycan distribution at Asn 24, or python ANN_predict.py Asn_83 0.001 0.004 1.03 1.1 to predict the glycan distribution at Asn 83 of a double-knockout mutant.

Alternatively, create an (N+1)x5 .csv with the first column as row names, the first row as column names, and levels of B4GALT1-B4GALT4 in the other columns and run the ANN_predict.py file as python ANN_predict.py <location> <path/to/file.csv>. The results will be saved as a new .csv file