Autoencoder and NCA based neural network model to estimate survival prognosis in multiple myeloma using arrayCGH data

Vidhi Malik 1, Shayoni Dutta 2, Navaneethan Radhakrishnan 1, Yogesh Kalakoti 1, Ritu Gupta 3,* and Durai Sundar 1,4,*

1 Department of Biochemical Engineering & Biotechnology, Indian Institute of Technology (IIT) Delhi, New Delhi - 110016, India;

2 Certara UK Ltd, Quantitative systems pharmacology division of SimCyp, Level 2-Acero, 1 Concourse Way, Sheffield S1 2BJ, United Kingdom.;

3 Laboratory Oncology Unit, Dr. B.R.A.IRCH, All India Institute of Medical Sciences (AIIMS), Ansari Nagar, New Delhi, 110029, India.

4 Yardi School of Artificial Intelligence, Indian Institute of Technology (IIT) Delhi, New Delhi – 110016, India.

About The Project

Multiple myeloma (MM) is malignancy of plasma cells, found in the bone marrow, which aids in fighting infections by synthesis of immunoglobulins. Clonal proliferation of abnormal plasma cell outgrows normal plasma cells and carry on synthesis of abnormal proteins, leading to MM. With advancements in clinical research, the disease has become highly manageable, but not curable. Various clinical factors are considered by medical practitioners for prediction of prognosis and treatment regimens for patients. An attempt has been made here to develop a tool that can help in predicting the survival and prognosis of MM patients, which will eventually support clinicians in designing suitable treatment regimen for the patients.

Built With

Usage

To use the proposed neural network based survival prediction model for Multiple myeloma patients, use commands:

cd TeamSundar/Multiple-myeloma_prognosis/NCA-Neuralnet

load mm_NCANN.mat
newoutput = mm_NCANN(newinput);

The pipeline require two input files:

  1. Clinical features file should have seven columns in a format specified below
aCGH ID_1 Age Gender OS_Time (days) Chemotherapy Regimen ISS Staging
253058713873_1 73 1 52 1 2
253058713873_3 75 0 175 4 2
253058713877_1 54 1 182 2 3
253058713877_2 58 1 203 7 3

Please refer following table for symbols used for features like gender, chemotherapy regimen, ISS satging and response columns:

Gender Chemotherapy Regimen Staging (International Staging System)
0 (Male) 1 : lenalidomide-dexamethasone (RD) 1 (ISS 1)
1 (Female) 2 : thalidomide-dexamethasone (TD) 2 (ISS 2)
3 : bortezomib-dexamethasone (VD) 3 (ISS 3)
4 : melphalan-prednisone-thalidomide (MPT)
5 : bortezomib- thalidomide-dexamethasone (VTD)
6 : bortezomib-lenalidomide-dexamethasone (VRD)
7 : bortezomib-cyclophosphamide-dexamethasone (VCD)
8: cyclophosphamide, thalidomide, dexamethasone (CTD)
  1. CNV file The required CNV input file should in the format specified in table below:
Sample Gene1 Gene2 .. GeneN
Sample 1
Sample2
..
SampleN

The neighbourhood component analysis (NCA) algorithm was used to reduce the dimension of input dataset that provided us a gene signature comprised of 211 genes that were able to classify the patients into three classes based on the progression event and death event of the participant. The input file should have CNV values for these 211 genes. The input file can be formatted using script ./Input_Data/Input_prep.py

Model will classifiy patient into three classes based on progression and death event chances i.e.,

  1. Class 1: 11 (Dead with relapse i.e., Progression event: 1 and death event :1)
  2. Class 2: 10 (alive with relapse i.e., Progression event : 1 and death event: 0)
  3. Class 3: 0 (alive with no relapse i.e., Progression event :0 and death event: 0)

The Matlab live script for proposed NCA-Neural network-based model is located in ./NCA-Neuralnet/ArrayCGH_NCA_Neural_net_92_7percent_accuracy_final_model.mlx

The Matlab live scripts for autoencoder based prediction models, DNN1 and DNN2 is located in directory ./DNN1_and_DNN2/ArrayCGH_DNN1_52_6_andDNN2_68_4percent_SVM_41_2_RUS_33percent.mlx

License

Distributed under the MIT License. See LICENSE for more information.