Deep learning models for analysis and classification of image data for CTA (the Cherenkov Telescope Array).
The following plots were produced with ctalearn v0.1 using the "Basic" single-telescope classification model to classify gamma-ray and proton showers using CTA prod3b simulated data after training for ~4.5 hours.
Setup Anaconda environment with:
conda config --add channels conda-forge
conda create -n [ENV_NAME] --file requirements.txt python=3.6
source activate [ENV_NAME]
Install package into the conda environment with pip:
/path/to/anaconda/install/envs/[ENV_NAME]/bin/pip install .
where /path/to/anaconda/install is the path to your anaconda installation directory and ENV_NAME is the name of your environment.
The path to the environment directory for the environment you wish to install into can be found quickly by running
conda env list
Finally, install the CPU or GPU version of Tensorflow using the instructions here. Tensorflow with GPU support must be installed to train models on GPU.
NOTE: The current version of ctalearn uses Tensorflow 1.4.1, so use the following links to download (for Python 3.6):
CPU: https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.4.1-cp36-cp36m-linux_x86_64.whl
GPU: https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.4.1-cp36-cp36m-linux_x86_64.whl
NOTE for developers: If you wish to fork/clone the respository and make changes to any of the ctalearn modules, the package should be reinstalled for the changes to take effect.
Install other dependencies (besides Tensorflow) with:
pip install -r requirements.txt
Install with pip:
pip install .
Finally, install the CPU or GPU version of Tensorflow using the instructions here. Tensorflow with GPU support must be installed to train models on GPU.
NOTE: The current version of ctalearn uses Tensorflow 1.4.1, so use the following links to download (for Python 3.6):
CPU: https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.4.1-cp36-cp36m-linux_x86_64.whl
GPU: https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.4.1-cp36-cp36m-linux_x86_64.whl
- Python 3.6
- Tensorflow 1.4.1
- Pytables 3.4.2
- Numpy 1.14.2
- OpenCV 3.3.1
and others specified in requirements.txt
All options for training a model are set by a single configuration file. See example_config.ini for an explanation of all available options.
Data The only currently accepted data format is HDF5/Pytables. A file list containing the paths to a set of HDF5 files containing the data must be provided. The ImageExtractor package is available to process, calibrate, and write CTA simtel files into the HDF5 format required by the scripts here. HDF5 files should be in the standard format specified by ImageExtractor.
For instructions on how to download the full pre-processed Prod3b dataset in ImageExtractor HDF5 format, see the wiki page here. (NOTE: requires a CTA account).
Data Processing Because the size of the full dataset may be very large, only a set of event indices is held in memory. During each epoch of training, a specified number of event examples is randomly drawn from the training dataset. Until the total number is reached, batches of a specified size are loaded and used to train the model. Batch loading of data may be parallelized using a specified number of threads. After each training epoch, the model is evaluated on the validation set.
Model Several higher-level model types are provided to train networks for single-telescope classification (single_tel_model) and array (multiple image) classification (variable_input_model, cnn_rnn_model)
Available CNN Blocks: Basic, AlexNet, MobileNet, ResNet, DenseNet
Available Network Heads: AlexNet (fully connected telescope combination), AlexNet (convolutional telescope combination), MobileNet, ResNet, Basic (fully connected telescope combination), Basic (convolutional telescope combination)
Training Training hyperparameters including the learning rate and optimizer can be set in the configuration file.
Logging Tensorflow checkpoints and summaries are saved to the specified model directory, as is a copy of the configuration file.
To train a model, run python train.py myconfig.ini
.
The following flags may be set: --debug
to set DEBUG logging level, --log_to_file
to save logger messages to a file in the model directory.
The model's progress can be viewed in real time using Tensorboard: tensorboard --logdir=/path/to/my/model_dir
.