rom-comma

Gaussian Process Regression, Global Sensitivity Analysis and Reduced Order Modelling by COMMA Research at The University of Sheffield

installation

Simply place the romcomma package in a folder included in PYTHONPATH (e.g. site-packages). Test the installation by running the installation_test module, from anywhere.

documentation

Dependencies are documented in pyproject.toml.

Full documentation for the romcomma package is published on readthedocs.

getting started

The following is not intended to substitute full package documentation, but to sketch the most essential, salient and practically important architectural features of the romcomma package. These are introduced by module (or package) name, in order of workflow priority. Presumably this will reflect the package users' first steps. Familiarity with Gaussian Processes (GPs), Global Sensitivity Analysis (GSA) and Reduction of Order by Marginalization (ROM) is largely assumed.

`data`

The data module contains classes for importing and storing the data being analyzed.

Import is from csv file or pandas DataFrame, in any case tabulated with precisely two header rows as

	Input X₁	... ...	Input X_M	Output Y₁	... ...	Output Y_L
optional column of N row indices	N rows of numeric data	...	N rows of numeric data	N rows of numeric data	...	N rows of numeric data

Any first-line header may be used instead of "Input", so long as it is the same for every column to be treated as input. Any first-line header may be used instead of "Output", so long as it is the same for every column to be treated as output, and is different to the first-line header for inputs.

Any second-line headers may be used, without restriction. But internally, the romcomma package sees

An (N, M) design matrix of inputs called X.
An (N, L) design matrix of outputs called Y.

The key assumption is that each input column is sampled from a uniform distribution X_i ~ U[min_i, max_i]. There is no claim that the methods used by this software have any validity at all if this assumption is violated.

In case X_i ~ CDF[X_i] the user should apply the probability transform CDF(X_i) ~ U[0, 1] to the input column i prior to any data import.

`Repository`

Data is initially imported into a Repository object, which handles storage, retrieval and metadata for repo.data. Every Repository object writes to and reads from its own repo.folder.

Every Repository object crucially exposes a parameter K which triggers k-fold cross-validation for this repo. Setting repo.K=K generates K Fold objects.

`Fold`

All data analysis is performed on Fold objects. A Fold is really a kind of Repository, with the addition of

fold.test_data, stored in a table (Frame) of N/K rows. The test_data does not overlap the (training) data in this Fold, except when the parent repo.K=1 and the ersatz fold.test_data=fold.data is applied.
Normalization of inputs: All training and test data inputs are transformed from X_i ~ U[min_i, max_i] to the standard normal distribution X_i ~ N[0, 1], as demanded by the analyses implemented by romcomma. Outputs are simultaneously normalized to zero mean and unit variance. Normalization exposes an undo method to return to the original variables used in the parent Repository.

The repo.K Folds are stored under the parent, in fold.folder=repo.folder\fold.k for k in range(repo.K). For the purposes of model integration, an unvalidated, ersatz fold.K is included with N datapoints of (training) data=test_data, just like the would-be ersatz K=1=k+1.

twinkarma/rom-comma