/serenet

SERENEt: SEgmentation and REcover NEtwork for SKA-Low Tomographic 21-cm images

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

SERENEt

SEgmentation and REcover NEtwork for SKA-Low multi-frequency tomographic 21-cm data for the Epoch of Reionization (EoR). The SERENEt code consist of a pre-process step, for foreground mitigation, and two U-shaped neural network for segmentation and 21-cm signal recovery, respectively. A general overview is given in Figure 1.

Figure 1: A simplified view of the project pipeline. The data include the mock observation with foreground and instrumental noise contamination, I_obs. The residual image after the pre-process step, I_res. The binary prior for neutral region identification and the recovered 21-cm image. Data input and output are shown with an example image.

  • In the first part, before applying any machine learning method, we process the challenge input data, I_obs , with an algorithm that partially subtracts the foreground contamination. The resulting residual image, I_res , will still contain some foreground residual and most systematic noise. However, this pre-processing step is essential to reduce the dynamic range in the contaminated image to a reasonable level for neural network training.

In the second step, we combine the input/output of two independently trained U-shaped 2D convolutional neural network, Seg-UNet and Rec-UNet. We refer to this step as the SERENEt pipeline.

  • The former is a segmentation for the identification of neutral hydrogen (HI) regions in 21-cm tomographic images. Results of SegU-Net can be found in our recent publication, Bianco et al. (2021). The resulting binary image, I_B, is employed as a shape and position prior for region of 21-cm emission for the second and final component of SERENEt that aim to recover the 21-cm signal.
  • By combining the residual and the prior images, RecU-Net will extrapolate meaningful information from both fields for an enhanced recovery of the 21-cm image. This is implemented by convolution blocks that takes the prior image as an additional input and intercepts the skip connection between the encoder and decoder layers. We show the architecture in Figure 2.


Figure 2: An overview of the architecture of RecU-Net. Convolutional layers process the binary map, provided by SegU-Net, and intercept the skip connection between the decoder and encoder.

Configuration files:

The configuration file stores the initial condition for the network training. The file is located in config/ with extension .ini.
Some of the variables are self-explanatory, while the following need to be changed accordingly:

  • AUGMENT: [string] the network architecture to use.
  • CHAN_SIZE: [int] channel dimension of the low-latent dimensional space.
  • PATH_IO: [string] the location of the training and validation sets.
  • SCRATCH_PATH: [string] the location where to store the network training outputs.
  • DATASET_PATH: [string or tuple string] the name of the training and validation sets.
  • LOSS: [string] the name of loss function to be used (conform to the available metris in metrics.py).
  • METRICS: [string or list string] the name of the training and validation sets.
  • GPUS: [bool] if true it uses all the available GPU devices on the machine (deprecated, soon to be removed).

Networks Training:

To train the network on your dataset, change the directory path variable PATH_IO in the initial condition files. The actual data should be stored at this location in DATASET_PATH/, in a sub-directory called data/.
To run use the following command:

▶ python serenet.py config/net.ini

The code copy and updated the .ini file in the output directory. If you require to resume the training, you should change the second arguments to the corresponding file in the defined output location.
In segunet.daint you can find an example of a bash shell for submitting trianing jobs on the Piz Daint machine at CSCS.

Network Predicting:

This section is still under development. You can have a look at the bash script predserenet.daint and the two python scripts pred_recUNet.py and pred_segUNet.py for reference.

Code Structure:

The SERENEt code is structured in folders that organize the different python scripts for training, prediction or post-processing plotting. Here a list of the most relevant files:

config/
├─ net_config.py
├─ net_RecUnet.ini
├─ net_SegUnet_lc.ini
├─ net_SERENEt_lc.ini
tests/
utils/
├─ 3Dto2D.py
├─ other_utils.py
utils_data/
utils_network/
├─ callbacks.py
├─ dataset.py
├─ data_generator.py
├─ metrics.py
├─ networks.py
utils_plot/
├─ other_utils.py
├─ plotting.py
├─ plot_optimisation.py
├─ plot_test.py
├─ postprops_plot.py
utils_pred/
├─ prediction.py
opt_talos.py
pred_serenet.py
serenet.py
predserenet.daint