/cosmicpca

Python and C++ implementation of Robust PCA methods

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Cosmic PCA

Licence

The Transiting Exoplanet Survey Satellite (TESS) is expected to suffer from unusually bad cosmic ray contamination, thanks to its very deep pixels in comparison to Kepler. We want to see if we can solve this problem, and related problems in observational astronomy. Here, we use Robust PCA (RPCA), which decomposes data into sparse and low-rank components, to separate out cosmic rays from stellar variability. To my knowledge, this technique has not so far been used for this purpose in astronomy, though in adjacent fields it has been used for exoplanet direct imaging, baryons in haloes and geophysics.

This is a fork of https://github.com/tjof2/robustpca, a C++ implementation of Robust Orthonormal Subspace Learning, available under a GNU GPL v3.0 license, as accordingly is this code.

Contents

Description

We want to use Robust PCA to separate a stream of TESS images into a sparse component (which hopefully tracks cosmic rays) and a low-rank component (which hopefully recovers the star field).

The Robust PCA implementation is taken from Tom Furnival's C++ implementation of the Robust Orthonormal Subspace Learning (ROSL) algorithm.

The cosmic ray generator is from the TESS Science Team SPyFFi package.

Installation

Dependencies

This library makes use of the Armadillo C++ linear algebra library, which needs to be installed first.

Also install SPyFFI from source at https://github.com/TESScience/SPyFFI or with

pip install SPyFFI

Building from source

To build the library, unpack the source and cd into the unpacked directory, then type make.

This will generate a C++ library called librosl.so, which is called by the Python module pyrosl.

Usage

For a corrupted observation matrix X, one can run the ROSL algorithm with the following required parameters:

import pyrosl

example_rosl = pyrosl.ROSL( 
    method='full',
    rank = 5,
    reg = 0.1
)
loadings, components, E = full_rosl._fit(X)
loadings = loadings[:, :full_rosl.rank_]

model = np.dot(loadings, full_rosl.components_)
sparse = E.reshape(np.shape(dat)) # dat is input data in (ncad, nx, ny) format
lowrank = model.reshape(np.shape(dat))

A simple Python script is included with examples of both ROSL and the fast ROSL+ algorithms, as well as further optional parameters. It can be run on the command line with:

$ python example.py

Further documentation can be found in the file pyrosl.py.

Todo

  • Write tests