/MovementOutcome

Framework for developing graph convolutional networks for prediction of binary movement outcome

Primary LanguagePythonOtherNOASSERTION

MovementOutcome

How to setup on a Windows machine

GPU activation

We strongly advise the use of workstation with NVIDIA GPU to speed up training of models. To enable use of GPU, follow these instructions (NB: Please ignore this step if similar GPU activation has been performed while setting up Markerless framework):

  1. Download Visual Studio 2017 Free Community Edition and install the program by following the necessary steps.
  2. Download CUDA Toolkit 11.1 Update 1 and follow instructions to perform installation.
  3. Copy the file 'ptxas.exe' in the folder 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin' to 'Desktop'.
  4. Download CUDA Toolkit 11.0 Update 1 and follow instructions to perform installation.
  5. Copy the file 'ptxas.exe' from 'Desktop' to the folder 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin'.
  6. Create a user at NVIDIA.com and download CUDNN 8.0.4.
  7. Open 'cudnn-11.0-windows-x64-v8.0.4.30.zip' in 'Downloads' and move the files in the folders 'bin', 'include', and 'lib' under 'cuda' to associated folders ('bin', 'include', and 'lib') in 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0'.
  8. Restart the computer.

Setup MovementOutcome framework

To setup the MovementOutcome framework, follow these instructions:

  1. Download Anaconda and perform the installation (if you have not previously downloaded and installed Anaconda).
  2. Open a command prompt and clone the MovementOutcome framework: git clone https://github.com/DeepInMotion/MovementOutcome.git
  3. Navigate to the MovementOutcome folder: cd MovementOutcome
  4. Create the virtual environment movementoutcome: conda env create -f environment.yml

How to use on a Windows machine

Neural architecture search, cross-validation, and evaluation

This is a step by step procedure for how to use the MovementOutcome framework to search for, cross-validate, and evaluate graph convolutional networks (GCNs) suitable for a particular dataset of movements of individuals related to a specific movement outcome:

  1. Open a command prompt and activate the virtual environment: activate movementoutcome
  2. Navigate to the MovementOutcome folder: cd MovementOutcome
  3. Open the code library in a web browser: jupyter lab
  4. Create a new project folder under 'projects' with a specified name (e.g., 'im2021').
  5. Create a subfolder within your project folder with name 'searches' (e.g., 'im2021/searches'). Your results from neural architecture search (NAS), cross-validation, and evaluation will be stored in this folder.
  6. Create a subfolder within your project folder with name 'data' (e.g., 'im2021/data').
  7. Upload coordinate files and outcomes
  • Alternative a) If you have raw coordinate CSVs (e.g., generated by the Markerless framework) not sorted into cross-validation folds and test set: Create a subfolder 'raw' within 'data', and upload your raw coordinate files (i.e., with prefix 'orgcoords_') into a folder named 'coords' (e.g., 'im2021/data/raw/coords') and outcome file (i.e., 'outcomes.csv') into 'outcomes' folder (e.g., 'im2021/data/raw/outcomes'). The procedure will randomize the coordinate files into folders for cross-validation folds (e.g., 'val1') and test set (i.e., 'test') and preprocess the coordinate files to generate Numpy array files for datasets that are stored in the 'processed' subfolder (e.g., 'im2021/data/processed/test_coords.npy').
  • Alternative b) If you have previously determined the dataset split and generated separate Numpy array files for coordinate files (e.g., 'test_coords.npy'), individual IDs (e.g., 'test_ids.npy'), and outcomes (e.g., 'test_labels.npy') of each dataset: Create a subfolder 'processed' within the 'data' folder (e.g., 'im2021/data/processed) and directly upload the three Numpy array files of each dataset into this folder.
  1. Set choices for NAS, cross-validation and/or evaluation in 'main.py':
  • Line 10: Set name of your project folder.
  • Line 22: Set name of the search. Hyperparameters of the search and all data related to individual search experiments will be stored inside a folder with the given search name within the 'searches' subfolder.
  • Line 25: Set search = True if you want to run NAS to find a suitable GCN, otherwise set search = False if you have previously run NAS.
  • Line 26: Set crossval = True if you want to cross-validate the GCN with highest performance (i.e., Area Under ROC Curve) on the NAS, otherwise use crossval = False to skip cross-validation.
  • Line 27: Set evaluate = True if you want to evaluate on the test set the GCN instances obtained from cross-validation, otherwise use evaluate = False. The evaluation will use the GCN instances as an ensemble where the final classification is based on the aggregated prediction across the instances.
  • Line 30-31: Define computational device responsible for the analysis (i.e., output_device) and number of workers responsible for data handling.
  • Line 34: Set reference to model script for defining GCN (e.g., model_script = 'models.gcn_search_model' for script 'gcn_search_model.py' in 'models').
  • Line 35: Set number of dimensions in coordinate files (e.g., input_dimensions = 2 for 2D coordinates).
  • Line 36: Set number of body parts in coordinate files (e.g., input_spatial_resolution = 19).
  • Line 37: Set number of time steps in a movement window (e.g., input_temporal_resolution = 150).
  • Line 38-40: Set additional fixed hyperparameters of a GCN.
  • Line 43-48: Define the human skeleton (e.g., body parts, neighboring body parts, and center of skeleton).
  • Line 52-57: Set biomechanical properties to use as input for GCN.
  • Line 60-63: Set temporal resolution of coordinate files and skeleton sequences (i.e., raw_frames_per_second and processed_frames_per_second) and options for preprocessing skeleton sequences with Butterworth filter.
  • Line 66-67: Set batch size for training and validation (e.g., trainval_batch_size = 32) and number of epochs taken into account for computing smoothed validation loss (e.g., loss_filter_size = 5).
  • Line 70: Set number of positive samples required per negative sample (i.e., train_num_positive_samples_per_negative_sample) to compensate for unbalanced datasets. Should ideally be set to number of individuals with negative outcome divided by number of individuals with positive outcome.
  • Line 75-91: Adjust hyperparameters of the optimizer and data augmentation if desired. However, we suggest using the default values as a starting point as they have worked well across a wide variety of GCNs.
  • Line 94-101: Set preferences for the evaluation process, including portion of individuals in the test set (i.e., test_size), distance between subsequent movement windows (e.g., parts_distance = 75 for 50% overlapping windows with input_temporal_resolution = 150), scheme for aggregating predictions across movement windows, and prediction threshold for classifying an individual as positive outcome (e.g., prediction_threshold = 0.5).
  • Line 104-137: Specify the details of the NAS, including hyperparameters of the K-Best Search strategy (e.g., k and performance_threshold), choices and associated alternatives in the search space (i.e., search_space), training and validation sets of the search (i.e., search_train_dataset and search_val_dataset), number of epochs (search_num_epochs), and performance requirements per epoch (i.e., search_critical_epochs and search_critical_epoch_values).
  • Line 140-147: Specify the details for cross-validation, including number of validation folds (i.e., crossval_folds) and number of epochs for each cross-validation run (i.e., crossval_num_epochs).
  1. Save 'main.py' (with the chosen hyperparameter setting).
  2. Open a new terminal window from the jupyter lab tab in the web browser.
  3. Run NAS, cross-validation and/or evaluation in the terminal window: python main.py
  4. The results of the NAS, cross-validation, and evaluation processes are stored in the folder of the current search within the 'searches' folder (e.g., 'im2021/searches/21092022 1522 IM2021').

Skeleton-based prediction of movement outcome

To employ a cross-validated GCN obtained by NAS for prediction of movement outcome from raw coordinate files we suggest the following steps:

  1. Set search details in prediction script (i.e., 'predict/prediction.py'):
  • Line 104: Set name of your project folder (e.g., 'im2021').
  • Line 114: Set name of the search used to obtain the GCN (e.g., '21092022 1522 IM2021').
  • Line 117: Set save = True if you want to save predicted risk of outcome, classification and associated certainty in CSV file, otherwise set save = False.
  • Line 118: Set visualize = True if you want to store class activation map visualization of body parts with highest contribution towards predicted risk of outcome, otherwise set visualize = False.
  • Line 121: Define computational device responsible for the analysis (i.e., output_device) and number of workers responsible for data handling.
  • Line 125: Set reference to model script for defining GCN (e.g., model_script = 'models.gcn_search_model' for script 'gcn_search_model.py' in 'models').
  • Line 126: Set number of dimensions in coordinate files (e.g., input_dimensions = 2 for 2D coordinates).
  • Line 127: Set number of body parts in coordinate files (e.g., input_spatial_resolution = 19).
  • Line 128: Set number of time steps in a movement window (e.g., input_temporal_resolution = 150).
  • Line 129-131: Set additional fixed hyperparameters of a GCN.
  • Line 134-139: Define the human skeleton (e.g., body parts, neighboring body parts, and center of skeleton).
  • Line 141: Set sample coordinates of human skeleton (i.e., sample_coords).
  • Line 144-149: Set biomechanical properties to use as input for GCN.
  • Line 152-155: Set temporal resolution of coordinate files and skeleton sequences (i.e., raw_frames_per_second and processed_frames_per_second) and options for preprocessing skeleton sequences with Butterworth filter.
  • Line 158-162: Set hyperparameters for the evaluation process, including batch size (i.e., evaluation_batch_size), distance between subsequent movement windows (e.g., parts_distance = 75 for 50% overlapping windows with input_temporal_resolution = 150), scheme for aggregating predictions across movement windows, and prediction threshold for classifying an individual as positive outcome (e.g., prediction_threshold = 0.5).
  • Line 165: Set the number of cross-validation folds (i.e., crossval_folds).
  1. Create a folder with coordinate files that should be analyzed by the prediction model (e.g., 'coords').
  2. Run the prediction script on the coordinate files in the created folder. E.g.: python predict/prediction.py coords
  3. The results of the prediction model are stored in a folder with the same name as the folder of the coordinate files in the specific search folder within 'searches' (e.g., 'im2021/searches/21092022 1522 IM2021/coords').
  • Predicted risk of outcome, classification, and associated classification certainty of each individual are stored in CSV file (e.g., 'im2021/searches/21092022 1522 IM2021/coords/individual1_results.png').
  • Class activation map visualizations of body part contributions towards risk of movement outcome for the individual are stored as PNG file (e.g., 'im2021/searches/21092022 1522 IM2021/coords/individual1_cam.png').

Tip: The ensemble script (i.e., 'predict/prediction_ensemble.py') performs prediction by combining the outputs of different GCNs obtained from seperate searches.