GWU Data Science - Deep Learning - Group 10

Summary

Describe the project

I run the commands in project_instance_config.txt to update my environment and to download/unzip data files.
- Updates environment (cuda drivers, tensorflow-gpu, keras)
- Unzip images and metadata to local machine
Change Code/configuartion.py to reflect locations of unzipped files. Though it is probably easier to unzip to match file structure as observed in Code/configuration.py.

This can be skipped if you have Code/df_images_all.csv, Code/df_sequences_all.csv, and Code/df_sequences_missing_labels_completed.csv.

Code/preprocessing_helper_get_sequence_labels.py: creates csv of sequences and images with their labels
- this gets labels for sequences that have labels.
- this also gets a list of all images by its sequence.
Code/preprocessing_helper_missing_labels.py: this was used to aid our manual tagging effort
- i used this to create a csv of the peak image for sequences that did not have labels.
- moves all peak images for untagged sequences into one folder
- creates a csv that can be manually updated with assigned tag for sequence
- i manually tagged peak images for 266 sequences.

Code/training_split.py: creates csv for the training and testing sets
- this gets the csv of sequences (with tags).
- assigns the tag for the peak image to all images in the sequence.
- i use images that are close to the peak image.
- the number/percentage of nearby images is configurable.
Code/training_model_building.py: trains neural network
- uses keras imagedatagenerator.flow_from_dataframe for pipeline
- When image filenames are read from dataframes, you will need to update filenames to match your file structure (as discussed above).