Probabilistic PCA in Python

This repository contains a simple implementation of probabilistic PCA as introduced in [1].

[1] Michael E. Tipping and Christopher M. Bishop.
    Probabilistic Principal Component Analysis
    Journal of the Royal Statistical Society. Series B (Statistical Methodology)
    Vol. 61, No. 3 (1999), pp. 611-622.

Also consider citing this master thesis for which this version of probabilistic PCA was implemented:

@misc{Stutz2017,
    author = {David Stutz},
    title = {Learning Shape Completion from Bounding Boxes with CAD Shape Priors},
    month = {September},
    year = {2017},
    institution = {RWTH Aachen University},
    address = {Aachen, Germany},
    howpublished = {http://davidstutz.de/},
}

For theoretical background, consider reading [1], or see the discussion in Section B.1 of the master thesis.

Requirements

Python packages:

NumPy
SciPy (specifically scipy.sparse.linalg.svds)
HDF5, i.e. h5py

For visualization:

Matplotlib

Usage

To compute a probabilistic PCA, use ppca_train.py:

usage: ppca_train.py [-h] [--input INPUT] [--code CODE]
                     [--approximate_k APPROXIMATE_K] [--mean_file MEAN_FILE]
                     [--V_file V_FILE] [--var_file VAR_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --input INPUT         path input HDF5 file
  --code CODE           size of latent space
  --approximate_k APPROXIMATE_K
                        approximate the variance using approximate_k singular
                        values
  --mean_file MEAN_FILE
                        path to HDF5 mean file
  --V_file V_FILE       path to HDF5 matrix file
  --var_file VAR_FILE   path to HDF5 variance file

The main parameter is the input, which has to be a HDF5 file where the first dimension is the number of samples, the remaining dimensions do not matter as they are reshaped. Then, --code determines the number of principal components to use.

As probabilistic PCA requires to compute the variance (see [1]) for which all eigenvalues are required, the computation can become infeasible for high dimensionality. Therefore, the variance can be approximated using the first k eigenvalues instead which can be set using --approximate_k and should be significantly larger than --code but can also be smaller than the total dimensionality.

The output is stored separately in --mean_file, V_file and var_file -- all HDF5 files.

Using ppca_test.py, the computed probabilistic PCA can be tested; for example on a test or validation set:

usage: ppca_test.py [-h] [--input INPUT] [--mean_file MEAN_FILE]
                    [--V_file V_FILE] [--var_file VAR_FILE] [--output OUTPUT]

optional arguments:
  -h, --help            show this help message and exit
  --input INPUT         path input HDF5 file
  --mean_file MEAN_FILE
                        path to HDF5 mean file
  --V_file V_FILE       path to HDF5 matrix file
  --var_file VAR_FILE   path to HDF5 variance file
  --output OUTPUT       path to output HDF5 file

Here, the input is a HDF5 file containing the test/validation data.

Example

As example, we provide a simple dataset of rotated and slightly translated binary rectangles in 32 x 32 resolution. Probabilistic PCA can be applied as follows:

python ppca_train.py --input=/BS/dstutz/work/data/2d/outputs_training_prior_moderate.h5 --code=10

In order to test the decomposition:

python ppca_test.py --input=/BS/dstutz/work/data/2d/outputs_validation_moderate.h5 --output=predictions.h5

The results can be viewed using:

python view_hdf5.py --predictions=predictions.h5 --target=/BS/dstutz/work/data/2d/outputs_validation_moderate.h5

License

License for source code corresponding to:

D. Stutz. Learning Shape Completion from Bounding Boxes with CAD Shape Priors. Master Thesis, RWTH Aachen University, 2017.

Please read carefully the following terms and conditions and any accompanying documentation before you download and/or use this software and associated documentation files (the "Software").

The authors hereby grant you a non-exclusive, non-transferable, free of charge right to copy, modify, merge, publish, distribute, and sublicense the Software for the sole purpose of performing non-commercial scientific research, non-commercial education, or non-commercial artistic projects.

Any other use, in particular any use for commercial purposes, is prohibited. This includes, without limitation, incorporation in a commercial product, use in a commercial service, or production of other artefacts for commercial purposes.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

You understand and agree that the authors are under no obligation to provide either maintenance services, update services, notices of latent defects, or corrections of defects with regard to the Software. The authors nevertheless reserve the right to update, modify, or discontinue the Software at any time.

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. You agree to cite the corresponding papers (see above) in documents and papers that report on research using the Software.

davidstutz/probabilistic-pca

Probabilistic PCA in Python

Requirements

Usage

Example

License