This is the project for my bachelor's thesis. The goal of this project is to apply different Machine Learning algorithms to patient data in order to create a prediction model that in the future will give psychiatrists a second opinion on whether their patients might be tending towards a depression or mania episode, or if they are staying in a euthymic state.
This project consists in:
- A notebook made with Jupyter Notebook that contains the whole process of data cleaning and Exploratory Data Analysis as well as the application of different Machine Learning algorithms to predict the state of the patient.
- A python script that uses a Random Forest classifier dumped from the notebook and gives a prediction of the possible state a patient could be tending towards given a certain user input.
This project springs from the Bip4cast project, which studies the appearance of crisis in patients with Bipolar Disorder in order to predict them. The goal of the Bip4cast project is to be able to react in time and avoid the symptoms before the patients start to suffer from them.
The following technologies and frameworks are used on the project:
The following code snippet shows the application of a Random Forest algorithm to create a classifier:
X_train, X_test, y_train, y_test = train_test_split(
interviews_episodes.loc[:, interviews_episodes.columns != "episode"],
interviews_episodes["episode"], test_size=0.3
)
clf = RandomForestClassifier(n_jobs=-1)
clf.fit(X_train, y_train)
scores = cross_val_score(clf, X_test, y_test)
print "Model accuracy: ", scores.mean()
If you are running Linux on your computer you can install and configure pip for an easier installation of the libraries:
$ sudo apt-get install python-pip python-dev build-essential
$ sudo pip install --upgrade pip
In order to run the Jupyter Notebook notebook you need to have python 2.7 installed on your computer and install Jupyter Notebook, which you can do with pip:
$ pip install jupyter
The required python libraries are:
- pandas
- numpy
- seaborn
- matplotlib
- IPython
- pydotplus
- cPickle
In order to install them, run:
$ pip install pandas
$ pip install numpy
$ pip install seaborn
$ pip install matplotlib
$ pip install IPython
$ pip install pydotplus
$ pip install cPickle
For the notebook to work, the data that you import needs to have the same format as the files in the data folder.
In order to run the python script you need to have python 2.7 installed on your computer. The required python libraries are:
- numpy
- sklearn
- termcolor
In order to install them you can
$ pip install numpy
$ pip install sklearn
$ pip install termcolor
To execute the Bipolar Disorder Crisis Prediction python script just run:
$ python state_prediction.py