eem_analysis: A Python repository from CamDavidsonPilon

Autoencoding EEMs

Analysis of EEMs

EEMs (excitation emission matrices) are measurements of a sample's fluorescence intensity at varying excitation and emission wavelengths.

Traditionally, EEMs have been analyzed using linear matrix decomposition methods like PARAFAC. To interpret the decomposition, PARAFAC relies on some strong chemical assumptions (not just statistical), namely:

There are no inner filter effects occurring
No quenching is present
Beer-Lambert law is satisfied
No additional scattering is present

If we generalize to non-linear decomposition, and ignore any attempt at interpretation, we can expand the models used. Namely, we can try a convolutional autoencoder to project the 2D EEMs to a lower space, and perform analysis there. The convolutional autoencoder has a much more accurate compression than alternative methods like PARAFAC. (This also means that the decompression is more accurate, as seen in the image below.)

PARAFAC does do a better job when scattering is reduced. If we apply a naive Rayleigh scattering filter to our EEMS:

In the comparison above, the convolutional autoencoder, henceforth CNN-AE, squeezes the 28x28 data into 12 dimensions. From these 12 dimensions, further dimensionality reduction can be applied, like PCA. The following figure is a PCA-reduced dataset of four vegetables' EEMS:

We can clearly see the clusters of vegetables are almost perfectly separated, hence their original EEMs have enough information to distinguish vegetables.

Existing CNN-AE network

Encoder -> Decoder.

Installation

Clone/download the repo to a local directory.
Optional: create a virtualenv for this.
From the command line:

python setup.py install

Configuration

Currently the supported EEMs must be NxN (a square). One can use image / scientific software to resize EEMs to be square. Change the INPUTS variable in src/utils.py.
Data, in the form of csv (with .csv extension), should be put into the folder data/flat_files.
To added labeling information, you can user - delimiters in the filename and edit the Labels in src/utils.py.

Running on an example dataset