Table of Contents
Not only a network of Gravitational Waves, Geophysics and Machine Learning experts, G2Net was also released as a Kaggle Competition. G2Net origin dates back to the discovery of Gravitational Waves (GW) in 2015 (The Sound of Two Black Holes Colliding). The aim of this competition was to detect GW signals from the mergers of binary black holes. Specifically, the participant was expected to create and train a model to analyse synthetic GW time-series data from a network of Earth-based detectors (LIGO Hanford, LIGO Livingston and Virgo). The implementations in this repository skyrocketed the ranking (AUC score on test set) to top 8% under certain settings, not meaning with the above that it cannot be further improved.
The model implemented for the competition (see the image below) has been created following an end-to-end philosophy, meaning that even the time-series pre-processing logic is included as part of the model and might be made trainable. To know more details about the building blocks of the model, refer to any of the Colab Guides provided by the project.
The major project source code files are listed below in a tree-like fashion:
G2Net
└───src
│ config.py
│ main.py
├───ingest
│ DatasetGeneratorTF.py
│ NPYDatasetCreator.py
│ TFRDatasetCreator.py
├───models
│ ImageBasedModels.py
├───preprocess
│ Augmentation.py
│ Preprocessing.py
│ Spectrogram.py
├───train
│ Acceleration.py
│ Losses.py
│ Schedulers.py
└───utilities
GeneralUtilities.py
PlottingUtilities.py
The most important elements in the project are outlined and described as follows:
config.py
: Contains a configuration class with the parameters used by the model or the training process and other data ingestion options.main.py
: Implements the functionality to train and predict with the model locally in GPU/CPU.- Ingest module:
NPYDatasetCreator.py
: Implements the logic to standardise the full dataset on a multiprocessing fashion.TFRDatasetCreator.py
: Implements the logic to standardise, encode, create and decode TensorFlow records.DatasetGeneratorTF.py
: Includes a class implementing functionality to create TensorFlow Datasets pipelines from both TensorFlow records and NumPy files.
- Models module:
ImageBasedModels.py
: Includes a Keras model based on 2D convolutions preceded by a pre-processing phase culminated with the generation of a spectrogram or similar. The 2D convolutional model is here an EfficientNet v2.
- Preprocess module:
Augmentation.py
: Implements several augmentations in the form of Keras layers, including Gaussian noise, spectral masking (TPU-compatible and TPU-incompatible versions) and channel permutation.Preprocessing.py
: Implements several preprocessing layers in the form of trainable Keras layers, including time windows (TPU-incompatible Tukey window and generic TPU-compatible window), bandpass filtering and spectral whitening.Spectrogram.py
: Includes a TensorFlow version of CQT1992v2 implemented in nnAudio with PyTorch. Being in the form of a Keras layer, it also adds functionality to adapt the output range to that recommended as per stability by 2D convolutional models.
- Train module:
Acceleration.py
: Includes the logic to automatically configure the TPU if any.Losses.py
: Implements a differentiable loss whose minimisation directly maximises the AUC score.Schedulers.py
: Implements a wrapper to make CosineDecayRestarts learning rate scheduler compatible with ReduceLROnPlateau.
- Utilities module:
GeneralUtilities.py
: General utilities used all along the project mainly to perform automatic Tensor broadcast and determine mean and standard deviation from a dataset with multiprocessing capabilities.PlottingUtilities.py
: Includes all the logic behind the plots.
Among others, the project has been built around the following major Python libraries (check config/g2net.yml
for a full list of dependencies with tested versions):
In order to make use of the project locally (tested in Windows), one should just follow two steps:
- Clone the project:
git clone https://github.com/salvaba94/G2Net.git
- Assuming that Anaconda Prompt is installed, run the following command to install the dependencies:
conda env create --file g2net.yml
To experiment locally:
- First, you'll need to manually download the Competition Data as the code is not going to do it for you to avoid problems with connectivity (while downloading a heavy dataset). Paste the content into the
raw_data
folder. - The controls of the code are in
src/config.py
. Make sure that, the first time you run the code, any ofGENERATE_TFR
orGENERATE_NPY
flags are set toTrue
. This will generate standardised datasets in TensorFlow records or NumPy files, respectively. - Set to
False
these flags and make sure that you are reading the data in the format you generated with the flagFROM_TFR
. - You are ready to play with the rest of options!
If by any chance you experience a NotImplementedError
(see below), it is an incompatibility issue between the installed TensorFlow and NumPy library versions. It is related to a change in exception types that makes it to be uncaught.
NotImplementedError: Cannot convert a symbolic Tensor (gradient_tape/model/bandpass/irfft_2/add:0) to a numpy array.
This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported.
The origin is in line 867 in tensorflow/python/framework/ops.py
. It is solved by replacing
def __array__(self):
raise NotImplementedError(
"Cannot convert a symbolic Tensor ({}) to a numpy array."
" This error may indicate that you're trying to pass a Tensor to"
" a NumPy call, which is not supported".format(self.name))
by
def __array__(self):
raise TypeError(
"Cannot convert a symbolic Tensor ({}) to a numpy array."
" This error may indicate that you're trying to pass a Tensor to"
" a NumPy call, which is not supported".format(self.name))
Alternatively, feel free to follow the ad-hoc guides in Colab:
Important note: As the notebooks connect with your Google Drive to save trained models, copy them to your Drive and run them from there not from the link. Anyway, Google is going to notify you that the notebooks have been loaded from GitHub and not from your Drive.
Any contributions are greatly appreciated. If you have suggestions that would make the project any better, fork the repository and create a pull request or simply open an issue. If you decide to follow the first procedure, here is a reminder of the steps:
- Fork the project.
- Create your branch:
git checkout -b branchname
- Commit your changes:
git commit -m "Add some amazing feature"
- Push to the branch:
git push origin branchname
- Open a pull request.
- EfficientNetV2: Smaller Models and Faster Training
- nnAudio: An on-the-fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks
- Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic
- AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients
- Darien Schettler (for solving with his amazing notebooks issues I had while using EfficientNet v2 with pretrained weights)
If you like the project and/or any of this contents results useful to you, don't forget to give it a star! It means a lot to me 😄