Machine learning code for RFI
All of the scripts in this file must be run from the root of the repository.
Virtual Env installation
# Create venv and install dependencies
python3 -m venv venv
source venv/bin/activate
pip install -r scripts/requirements.txt
# Install pyvex for reading .vex files
cd pyvex/pyvex
python setup.py install
# Preprocess power line data
bash data/preprocess_india_txt.sh
# Preprocess vlba data
bash data/preprocess_vlba.sh
source venv/bin/activate
pdoc3 --html src --force
The documentation will be available in html
for browsing.
Additionally, documentation is written in numpydoc format
Default config file creation
source venv/bin/activate
python -m src.config
The config file will be created in the root of the repository
source venv/bin/activate
# Train using the config file in the root
python -m src.train gan_config.settings
Configuration Options
USE_CUDA
- True to train using the GPU, false to use the CPU (defaults to true).
FILENAME
- Path to HDF5 file to load data from, relative to the repository root.
MAX_EPOCHS
- Max number of epochs to train the GAN for (defaults to 60).
MAX_GENERATOR_AUTOENCODER_EPOCHS
- Max number of epochs to train the generator autoencoder for (defaults to 60).
MAX_SAMPLES
- Maximum number of inputs to train on. Set to 0 for unlimited (defaults to 0).
BATCH_SIZE
- Number of samples to train on per batch (defaults to 4096).
POLARISATIONS
- Which polarisations should be used? (comma separated list, defaults to 0, 1).
FREQUENCIES
- Which frequencies should be used? (comma separated list, defaults to 0, 1, 2, 3).
NORMALISE
- Set to true to normalise inputs (defaults to true).
ADD_DROPOUT
- If true, add dropout to the inputs before passing them into the network (defaults to true).
ADD_NOISE
- If true, add noise to the inputs before passing them into the network (defaults to false).
REQUEUE_EPOCHS
- If > 0, perform REQUEUE_EPOCHS of training, stop, then run the REQUEUE_SCRIPT (defaults to 0).
REQUEUE_SCRIPT
- If REQUEUE_EPOCHS > 0, this script will be called to requeue the training script.
CHECKPOINT_DIRECTORY
- The directory to write checkpoints to, relative to the repository root.
RESULT_DIRECTORY
- The directory to write results to, relative to the repository root.
Example
USE_CUDA = True
FILENAME = data/processed/C148700001_fft.hdf5
MAX_EPOCHS = 60
MAX_AUTOENCODER_EPOCHS = 60
MAX_SAMPLES = 0
BATCH_SIZE = 128
NORMALISE = True
ADD_DROPOUT = False
ADD_NOISE = False
REQUEUE_EPOCHS = 0
REQUEUE_SCRIPT = ""
CHECKPOINT_DIRECTORY = data/checkpoints
RESULT_DIRECTORY = data/results/