Gaussian Process Regression, Global Sensitivity Analysis and Reduced Order Modelling by COMMA Research at The University of Sheffield
Simply place the romcomma
package in a folder included in PYTHONPATH
(e.g. site-packages
).
Test the installation by running the installation_test
module, from anywhere.
Dependencies are documented in pyproject.toml
.
Full documentation for the romcomma
package is published on readthedocs.
The following is not intended to substitute full package documentation, but to sketch the most essential, salient and practically important architectural
features of the romcomma
package. These are introduced by module (or package) name, in order of workflow priority.
Presumably this will reflect the package users' first steps. Familiarity with Gaussian Processes (GPs), Global Sensitivity Analysis (GSA) and
Reduction of Order by Marginalization (ROM) is largely assumed.
The data
module contains classes for importing and storing the data being analyzed.
Import is from csv file or pandas DataFrame, in any case tabulated with precisely two header rows as
Input X1 |
... ... |
Input XM |
Output Y1 |
... ... |
Output YL |
|
---|---|---|---|---|---|---|
optional column of N row indices |
N rows of numeric data |
... | N rows of numeric data |
N rows of numeric data |
... | N rows of numeric data |
Any first-line header may be used instead of "Input", so long as it is the same for every column to be treated as input. Any first-line header may be used instead of "Output", so long as it is the same for every column to be treated as output, and is different to the first-line header for inputs.
Any second-line headers may be used, without restriction. But internally, the romcomma
package sees
- An (N, M) design matrix of inputs called X.
- An (N, L) design matrix of outputs called Y.
The key assumption is that each input column is sampled from a uniform distribution Xi ~ U[mini, maxi]. There is no claim that the methods used by this software have any validity at all if this assumption is violated.
In case Xi ~ CDF[Xi] the user should apply the probability transform CDF(Xi) ~ U[0, 1] to the input column i prior to any data import.
Data is initially imported into a Repository
object, which handles storage, retrieval and metadata for repo.data
.
Every Repository
object writes to and reads from its own repo.folder
.
Every Repository
object crucially exposes a parameter K which triggers
k-fold cross-validation for this repo
.
Setting repo.K=K
generates K Fold
objects.
All data analysis is performed on Fold
objects. A Fold
is really a kind of Repository
, with the addition of
fold.test_data
, stored in a table (Frame
) of N/K rows. Thetest_data
does not overlap the (training)data
in thisFold
, except when the parentrepo.K=1
and the ersatzfold.test_data=fold.data
is applied.Normalization
of inputs: All training and test data inputs are transformed from Xi ~ U[mini, maxi] to the standard normal distribution Xi ~ N[0, 1], as demanded by the analyses implemented byromcomma
. Outputs are simultaneously normalized to zero mean and unit variance.Normalization
exposes anundo
method to return to the original variables used in the parentRepository
.
The repo.K
Folds
are stored under the parent, in fold.folder=repo.folder\fold.k
for k in range(repo.K)
.
For the purposes of model integration, an unvalidated, ersatz fold.K
is included with N datapoints of (training) data=test_data
,
just like the would-be ersatz K=1=k+1.