tallamjr/astronet

[MAJOR] Implement snX: Supernova Xception

tallamjr opened this issue · 3 comments

snX, aka Supernova Xception is an adaptation to the Xception: Deep Learning with Depthwise Separable Convolutions architecture by Francois Chollet.

Motivation

As of writing, the state of the art for time-series classification using deep learning can be found in Fawaz et. al review paper: Deep learning for time series classification: a review with InceptionTime leading the way for at least univaritate time-series [uvts] (it is still to be determined if SOTA is also achieved in the multivariate setting).

With regards to InceptionTime, this architecture has shown to favourable to multivariate time series [mvts] classification

... researchers started investigating these complex machine learning models for TSC (Wang et al.,
2017; Cui et al., 2016; Ismail Fawaz et al., 2019a). Precisely, Convolutional Neural Networks
(CNNs) have showed promising results for TSC

...
Given an input MTS, a convolutional layer consists of sliding one-dimensional filters over the time
series, thus enabling the network to extract non-linear discriminant features that are
time-invariant and useful for classification. By cascading multiple layers, the network is able to
further extract hierarchical features that should in theory improve the network’s prediction.

It is recommended to read into detail section 2.2 Deep learning for time series classification of which the above quotes are taken from for further discussion of the motivation for why a convolutional architecture is desirable for time-series and what has previous architectures have been tried, such as: Multi-scale Convolutional Neural Networks (MCNN) (Cui et al., 2016) and Time LeNet (Le Guennec et al., 2016) as well as "Fully Convolutional Neural Networks (FCNs) were shown to achieve
great performance without the need to add pooling layers to reduce the input data’s dimensionality (Wang et al., 2017)."

The above works laid the foundations for applying convolutional neural networks for UVTS and MVTS data classification.

Fawaz et al naturally takes this further with the application of the Inception module, inspired by Szegedy et al. (2015) with modifications specific for time series. Fawaz notes that this method has actually already been applied to Supernova classification:

Inception model was used for Supernovae classification using the light flux of a region in space as
an input MTS for the network (Brunel et al., 2019). However, the authors limited the conception
of their Inception architecture to the one proposed by Google for ImageNet (Szegedy et al., 2017).

Believed to be Inveption-V3, as there does not seem to exist skip connections in Brunel et al. work, found here: https://github.com/Anzzy30/SupernovaeClassification . See https://astro-informatics.slack.com/archives/D1E1A4JJH/p1593101980010600 for more discussion

In our work, we explore much larger filters than any previously proposed network for TSC in order
to reach state-of-the-art performance on the UCR benchmark.

Xception Network, Francois Chollet

As can be seen from the progression above, a trend has emerged between developments in architectures for Deep Computer Vision for classification and the successful application, and adaptation, for time-series classification. It seems plausible that the successor to Inception-v4 should yield improved results to time-series classification also.

The intuition behind this relates to the fact that images are simply 2D signals with 3 channels (RGB), and time-series is simple a 1D signal with M number of channels (or features/dimensions). As the the translation between 1D to ND signals is straight forward with normal signal processing, it is natural to see how improvements in 2D signal processing of images can be translated to 1D signals also.

Francois Chollet describes The Inception hypothesis in section 1.1 as

A convolution layer attempts to learn filters in a 3D space, with 2 spatial dimensions (width and
height) and a channel dimension; thus a single convolution kernel is tasked with simultaneously
mapping cross-channel correlations and spatial correlations. This idea behind the Inception module
is to make this process easier and more efficient by explicitly factoring it into a series of
operations that would independently look at cross-channel correlations and at spatial correlations.
More precisely, the typical Inception module first looks at crosschannel correlations via a set of
1x1 convolutions, mapping the input data into 3 or 4 separate spaces that are smaller than the
original input space, and then maps all correlations in these smaller 3D spaces, via regular 3x3 or
5x5 convolutions

In effect, the fundamental hypothesis behind Inception is that cross-channel correlations
and spatial correlations are sufficiently decoupled that it is
preferable not to map them jointly

...
In effect, we make the following hypothesis: that the mapping of cross-channels correlations and spatial correlations
in the feature maps of convolutional neural networks can be
entirely decoupled.

It is felt that this form is advantageous to our photometric time series data since one can obtain feature maps of each signal in each passband independently, with a cross-channel correlation and temporal correlations decoupled.

Xception is attractive since it maps the spatial (or in our case temporal) correlations for each output channel separately, and then performs a 1x1 depthwise convolution to capture cross-channel correlation -- "An Intuitive Guide to Deep Network Architectures"

Next Steps

Previously I have implemented the 2D Xception network here: https://github.com/tallamjr/dnn . It should be straight forward to copy this implementation and adapt for the 1D setting.

The implementation should live in a separate folder snX which would have the same structure as t2 but with several files "pulled out" in to a layer above for DRY principles; such as:

  • constants.py
  • evaluate.py
  • metrics.py
  • preprocess.py
  • utils.py
  • visualise_data.py
  • visualise_results.py

Also, it would be desireable at this stage to clean out unused files such as: opt/optimise.py and opt/somefile.py

Perhaps even an adaptation in the form of only Depthwise convolutions (i.e. Separable Convolutions without the point-wise second step) follows this work: tensorflow/tensorflow#36935 (comment)

When complete, an analysis is to be run linked to #52

Closed with #59