This repository contains the code in the study "Machine learning-based diagnostic prediction of minimal change disease: model development study." The study aims to develop predictive models to identify MCD among nephrotic syndrome patients based on demographic, blood test, and urine test variables. The original data is not shared, so please use the code with your data.
Age, albumin, eGFR, T-chol, IgG, C3, Urine RBC, and UP/day were selected as predictors based on clinical insights and literature.
Missing values were addressed using multivariate imputation by chained equations (MICE).
Nephrologists and renal pathologists assigned the definitive diagnoses. MCD was labeled as 1, and other diagnoses were labeled as 0.
Five models were developed: TabPFN, LightGBM, Random Forest, Artificial Neural Network, and Logistic Regression. Bayesian optimization with stratified 5-repeated 5-fold cross-validation was used for hyperparameter tuning. Performance metrics included AUROC, AUPRC, and Brier score. Model calibration was evaluated using calibration plots, and clinical utility was assessed by decision curve analysis.
The SHapley Additive exPlanations (SHAP) method was used to interpret the models, providing consistent and locally accurate attribution values for each predictor variable.
00_data_preprocessing.ipynb: Code for the data preprocessing steps, including handling missing values using multivariate imputation by chained equations.01_mode_development.ipynb: Code for hyperparameter tuning using Bayesian optimization with stratified 5-repeated 5-fold cross-validation for all models.02_model_evaluation.ipynb: Code for internal validation of the models, including performance metrics calculation like AUROC, AUPRC, and Brier score.03_model_interpretations.ipynb: Code for interpreting the models using the SHAP method to provide attribution values for each predictor variable.