We strongly advise the use of workstation with NVIDIA GPU to speed up training of models. To enable use of GPU, follow these instructions (NB: Please ignore this step if similar GPU activation has been performed while setting up Markerless framework):
- Download Visual Studio 2017 Free Community Edition and install the program by following the necessary steps.
- Download CUDA Toolkit 11.1 Update 1 and follow instructions to perform installation.
- Copy the file 'ptxas.exe' in the folder 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin' to 'Desktop'.
- Download CUDA Toolkit 11.0 Update 1 and follow instructions to perform installation.
- Copy the file 'ptxas.exe' from 'Desktop' to the folder 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin'.
- Create a user at NVIDIA.com and download CUDNN 8.0.4.
- Open 'cudnn-11.0-windows-x64-v8.0.4.30.zip' in 'Downloads' and move the files in the folders 'bin', 'include', and 'lib' under 'cuda' to associated folders ('bin', 'include', and 'lib') in 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0'.
- Restart the computer.
To setup the MovementOutcome framework, follow these instructions:
- Download Anaconda and perform the installation (if you have not previously downloaded and installed Anaconda).
- Open a command prompt and clone the MovementOutcome framework: git clone https://github.com/DeepInMotion/MovementOutcome.git
- Navigate to the MovementOutcome folder: cd MovementOutcome
- Create the virtual environment movementoutcome: conda env create -f environment.yml
This is a step by step procedure for how to use the MovementOutcome framework to search for, cross-validate, and evaluate graph convolutional networks (GCNs) suitable for a particular dataset of movements of individuals related to a specific movement outcome:
- Open a command prompt and activate the virtual environment: activate movementoutcome
- Navigate to the MovementOutcome folder: cd MovementOutcome
- Open the code library in a web browser: jupyter lab
- Create a new project folder under 'projects' with a specified name (e.g., 'im2021').
- Create a subfolder within your project folder with name 'searches' (e.g., 'im2021/searches'). Your results from neural architecture search (NAS), cross-validation, and evaluation will be stored in this folder.
- Create a subfolder within your project folder with name 'data' (e.g., 'im2021/data').
- Upload coordinate files and outcomes
- Alternative a) If you have raw coordinate CSVs (e.g., generated by the Markerless framework) not sorted into cross-validation folds and test set: Create a subfolder 'raw' within 'data', and upload your raw coordinate files (i.e., with prefix 'orgcoords_') into a folder named 'coords' (e.g., 'im2021/data/raw/coords') and outcome file (i.e., 'outcomes.csv') into 'outcomes' folder (e.g., 'im2021/data/raw/outcomes'). The procedure will randomize the coordinate files into folders for cross-validation folds (e.g., 'val1') and test set (i.e., 'test') and preprocess the coordinate files to generate Numpy array files for datasets that are stored in the 'processed' subfolder (e.g., 'im2021/data/processed/test_coords.npy').
- Alternative b) If you have previously determined the dataset split and generated separate Numpy array files for coordinate files (e.g., 'test_coords.npy'), individual IDs (e.g., 'test_ids.npy'), and outcomes (e.g., 'test_labels.npy') of each dataset: Create a subfolder 'processed' within the 'data' folder (e.g., 'im2021/data/processed) and directly upload the three Numpy array files of each dataset into this folder.
- Set choices for NAS, cross-validation and/or evaluation in 'main.py':
- Line 10: Set name of your project folder.
- Line 22: Set name of the search. Hyperparameters of the search and all data related to individual search experiments will be stored inside a folder with the given search name within the 'searches' subfolder.
- Line 25: Set
search = True
if you want to run NAS to find a suitable GCN, otherwise setsearch = False
if you have previously run NAS. - Line 26: Set
crossval = True
if you want to cross-validate the GCN with highest performance (i.e., Area Under ROC Curve) on the NAS, otherwise usecrossval = False
to skip cross-validation. - Line 27: Set
evaluate = True
if you want to evaluate on the test set the GCN instances obtained from cross-validation, otherwise useevaluate = False
. The evaluation will use the GCN instances as an ensemble where the final classification is based on the aggregated prediction across the instances. - Line 30-31: Define computational device responsible for the analysis (i.e.,
output_device
) and number of workers responsible for data handling. - Line 34: Set reference to model script for defining GCN (e.g.,
model_script = 'models.gcn_search_model'
for script 'gcn_search_model.py' in 'models'). - Line 35: Set number of dimensions in coordinate files (e.g.,
input_dimensions = 2
for 2D coordinates). - Line 36: Set number of body parts in coordinate files (e.g.,
input_spatial_resolution = 19
). - Line 37: Set number of time steps in a movement window (e.g.,
input_temporal_resolution = 150
). - Line 38-40: Set additional fixed hyperparameters of a GCN.
- Line 43-48: Define the human skeleton (e.g., body parts, neighboring body parts, and center of skeleton).
- Line 52-57: Set biomechanical properties to use as input for GCN.
- Line 60-63: Set temporal resolution of coordinate files and skeleton sequences (i.e.,
raw_frames_per_second
andprocessed_frames_per_second
) and options for preprocessing skeleton sequences with Butterworth filter. - Line 66-67: Set batch size for training and validation (e.g.,
trainval_batch_size = 32
) and number of epochs taken into account for computing smoothed validation loss (e.g.,loss_filter_size = 5
). - Line 70: Set number of positive samples required per negative sample (i.e.,
train_num_positive_samples_per_negative_sample
) to compensate for unbalanced datasets. Should ideally be set to number of individuals with negative outcome divided by number of individuals with positive outcome. - Line 75-91: Adjust hyperparameters of the optimizer and data augmentation if desired. However, we suggest using the default values as a starting point as they have worked well across a wide variety of GCNs.
- Line 94-101: Set preferences for the evaluation process, including portion of individuals in the test set (i.e.,
test_size
), distance between subsequent movement windows (e.g.,parts_distance = 75
for 50% overlapping windows withinput_temporal_resolution = 150
), scheme for aggregating predictions across movement windows, and prediction threshold for classifying an individual as positive outcome (e.g.,prediction_threshold = 0.5
). - Line 104-137: Specify the details of the NAS, including hyperparameters of the K-Best Search strategy (e.g.,
k
andperformance_threshold
), choices and associated alternatives in the search space (i.e.,search_space
), training and validation sets of the search (i.e.,search_train_dataset
andsearch_val_dataset
), number of epochs (search_num_epochs
), and performance requirements per epoch (i.e.,search_critical_epochs
andsearch_critical_epoch_values
). - Line 140-147: Specify the details for cross-validation, including number of validation folds (i.e.,
crossval_folds
) and number of epochs for each cross-validation run (i.e.,crossval_num_epochs
).
- Save 'main.py' (with the chosen hyperparameter setting).
- Open a new terminal window from the jupyter lab tab in the web browser.
- Run NAS, cross-validation and/or evaluation in the terminal window: python main.py
- The results of the NAS, cross-validation, and evaluation processes are stored in the folder of the current search within the 'searches' folder (e.g., 'im2021/searches/21092022 1522 IM2021').
To employ a cross-validated GCN obtained by NAS for prediction of movement outcome from raw coordinate files we suggest the following steps:
- Set search details in prediction script (i.e., 'predict/prediction.py'):
- Line 104: Set name of your project folder (e.g., 'im2021').
- Line 114: Set name of the search used to obtain the GCN (e.g., '21092022 1522 IM2021').
- Line 117: Set
save = True
if you want to save predicted risk of outcome, classification and associated certainty in CSV file, otherwise setsave = False
. - Line 118: Set
visualize = True
if you want to store class activation map visualization of body parts with highest contribution towards predicted risk of outcome, otherwise setvisualize = False
. - Line 121: Define computational device responsible for the analysis (i.e.,
output_device
) and number of workers responsible for data handling. - Line 125: Set reference to model script for defining GCN (e.g.,
model_script = 'models.gcn_search_model'
for script 'gcn_search_model.py' in 'models'). - Line 126: Set number of dimensions in coordinate files (e.g.,
input_dimensions = 2
for 2D coordinates). - Line 127: Set number of body parts in coordinate files (e.g.,
input_spatial_resolution = 19
). - Line 128: Set number of time steps in a movement window (e.g.,
input_temporal_resolution = 150
). - Line 129-131: Set additional fixed hyperparameters of a GCN.
- Line 134-139: Define the human skeleton (e.g., body parts, neighboring body parts, and center of skeleton).
- Line 141: Set sample coordinates of human skeleton (i.e.,
sample_coords
). - Line 144-149: Set biomechanical properties to use as input for GCN.
- Line 152-155: Set temporal resolution of coordinate files and skeleton sequences (i.e.,
raw_frames_per_second
andprocessed_frames_per_second
) and options for preprocessing skeleton sequences with Butterworth filter. - Line 158-162: Set hyperparameters for the evaluation process, including batch size (i.e.,
evaluation_batch_size
), distance between subsequent movement windows (e.g.,parts_distance = 75
for 50% overlapping windows withinput_temporal_resolution = 150
), scheme for aggregating predictions across movement windows, and prediction threshold for classifying an individual as positive outcome (e.g.,prediction_threshold = 0.5
). - Line 165: Set the number of cross-validation folds (i.e.,
crossval_folds
).
- Create a folder with coordinate files that should be analyzed by the prediction model (e.g., 'coords').
- Run the prediction script on the coordinate files in the created folder. E.g.: python predict/prediction.py coords
- The results of the prediction model are stored in a folder with the same name as the folder of the coordinate files in the specific search folder within 'searches' (e.g., 'im2021/searches/21092022 1522 IM2021/coords').
- Predicted risk of outcome, classification, and associated classification certainty of each individual are stored in CSV file (e.g., 'im2021/searches/21092022 1522 IM2021/coords/individual1_results.png').
- Class activation map visualizations of body part contributions towards risk of movement outcome for the individual are stored as PNG file (e.g., 'im2021/searches/21092022 1522 IM2021/coords/individual1_cam.png').
Tip: The ensemble script (i.e., 'predict/prediction_ensemble.py') performs prediction by combining the outputs of different GCNs obtained from seperate searches.