Stroke Tissue Prediction is a project for UCLA's Spring '19 CS 168 class, taught by Professor Fabien Scalzo. The goal is to use various machine learning techniques to analyze a set of scans of a brain, both pre and post-stroke, and determine the likelihood of developing lesions. Specifically, we want compare the accuracy of various algorithmic approaches.
These are needed before running the steps in Getting Started
.
- Python 3.7+ (macOS installation) (Linux installation)
These are installed via the bootstrapping process described in Getting Started
.
- bz2file 0.98
- cycler 0.10.0
- kiwisolver 1.1.0
- matplotlib 2.2.4
- nibabel 2.4.0
- numpy 1.16.2
- pydicom 1.2.2
- pyparsing 2.4.0
- python-dateutil 2.8.0
- pytz 2019.1
- scikit-learn 0.20.3
- scipy 1.2.1
- six 1.12.0
- subprocess32 3.5.3
- Run
./bootstrap.sh
to install necessary dependencies.- NOTE: if your default python is managed by anaconda (i.e.,
which python
returns something like/usr/local/anaconda3/bin/python
) you MUST do your initial bootstrap using the anaconda option:./bootstrap.sh --anaconda
.
- NOTE: if your default python is managed by anaconda (i.e.,
- Run
source build/bin/activate
to get inside the virtualenv. - Run
deactivate
to close the virtualenv.
Data for this project comes from a series of MRI scans for eighteen patients who had been treated for ischemic stroke at UCLA’s Ronald Reagan Medical Center. Each patient had two sets of MRI images: perfusion, and FLAIR, represented in the international standard DICOM image format, which is essentially a standard image appended with a load of metadata, such as patient name, brain slice location (slice meaning vertical location, relative to the patient’s eyes, moving up or down), and image capture time. The data will not be released as it contains privileged information of patients.
If you wish to do co-registration, please install the latest SimpleElastix library.
For your data: it should already be arranged in a folder called Patients
, placed in the root of this repository, such that each patient is assigned an integer-labeled subfolder. Within each Patient's subfolder, there ought to be a folder called Perfusion
containing Perfusion DICOM
s and a folder called FLAIR
, containing FLAIR DICOM
s.
From within the virtualenv, at the root of the repo, run python3 src/register_images.py {path_to_flair_perfusion_mappings}
.
The {path_to_flair_perfusion_mappings}
is a CSV file that contains rows of format:
PATIENTNUMBER,FIXEDDICOM,MOVINGDICOM
This will create a new directory called RESULT
in the src
directory, and store the co-registered DICOM
s inside with file name:
result_DICOMFILENAME_AcquitionNumber_AcquitionTime_SliceLocation.dcm
- From within the virtualenv, at the root of the repo, run
python3 src/generate_csvs.py {directory_with_dicoms} {n_intensity_vals} {n_live} {n_die}
.- For a more verbose description of these options and their utilization, run
python3 src/generate_csvs.py -h
.
- For a more verbose description of these options and their utilization, run
- This will generate a training and a testing
csv
per patient, which can be put through the machine learning models.- NOTE: directories must be structured like so:
Patients <-- structured_directory_root |--- 1 <-- patient number... must be an integer |--- FLAIR |--- Perfusion |--- 2 |--- FLAIR |--- Perfusion |--- ... |--- N |--- FLAIR |--- Perfusion
- Once the
CSV
files have been generated, you should runverify-csvs.sh
on them.- To do this, all
CSV
's you wish to affect must be in a single directory. - Recommended usage is as follows:
./verify-csvs.sh -n {col_count} -c -f col_label,col_label -i {lines_to_ignore} -br -d {directory_with_csvs}
- Essentially, this will combine all patients'
CSV
files into one, normalize column count tocol_count
, addcol_label,col_label
as the first line of the condensedCSV
, and ignore any intensity rows which contained background values. - For further usage details, run
./verify-csvs.sh -u
.
- Essentially, this will combine all patients'
- To do this, all
- With the
CSV
files cleaned and condensed, you should run them through the machine learning process of your choice. See theUtilizing Machine Learning
section for further details.
From within the virtualenv, at the root of the repo, run:
python3 src/classify_voxels.py {path_to_training_csv} {path_to_input_csv} {which_model_to_run}
-
All
- Put this in place of{which_model_to_run}
to use all the models list below, and output their individual F1-Score, Accuracy, Precision, and Recall. This will output to a singleCSV
,model_analysis.csv
. -
Use any of the below options to train on a 100/0 Split of Training, Test, in place of the
{which_model_to_run}
option. This will output to a singleCSV
, named{which_model_to_run}.csv
.- BaggingClassifier
- GradientBoostingClassifier
- LogisticRegression
- MLPClassifier_ActivationIdentity
- MLPClassifier_ActivationLogistic
- MLPClassifier_TanH
- MLPClassifier_Relu
- NearestCentroid
- KNeighborsClassifier
- DecisionTree
- RandomForest
- ExtraTreesClassifier
- StochasticGradientDescent_Hinge
- StochasticGradientDescent_Log
- StochasticGradientDescent_Perceptron
- StochasticGradientDescent_ModifiedHuber
- StochasticGradientDescent_SquaredHinge
- SupportVectorMachine_RBFKernel
- SupportVectorMachine_SigmoidKernel
For a more verbose description of these options and their utilization, run python3 src/classify_voxels.py -h
.
- Run
./bootstrap.sh -c
to blow away the virtualenv, and clean its dependent files.
Stroke is one of the leading cause of death and a major cause of long term disabilities worldwide. While prevention research identifies factors and specific drugs that may lower the risk of a future stroke, the treatment of ischemic stroke patients aims at maximizing the recovery of brain tissue at risk. It is typically done by recanalization of the occluded blood vessel; either mechanically or using a clot-busting drug. Prediction of tissue outcome is essential to the clinical decision-making process in acute ischemic stroke and would help to refine therapeutic strategies.
Recent studies based on Machine Learning have outperformed standard approaches and demonstrated great promise in predicting lesion growth. This project aims at comparing the accuracy of state-of-the-art machine learning methods including Deep Learning architectures, LSTM, Decision Trees, and Kernel Spectral Regression. The input will be based on source perfusion weighted MRI obtained after admission of the stroke patient, and the target output will correspond to the lesion annotated on FLAIR images by a Neurologist 3 days after admission.