mmaelicke/scikit-gstat

3D Kriging Example.

JorjeousJorje opened this issue · 9 comments

Hello! Will examples for 3D Kriging be added in the future? Thanks for the answer!

Hey @JorjeousJorje, thanks for the question.
I am not planning on adding something. But if you have some interesting data, I could consider building an example based on that.
In the meantime, you might want to check out the 3D example from gstools (you need to scroll down a little bit). You can export any skgstat.Variogram to a gstools.CovModel - there is also a tutorial covering the interface: https://mmaelicke.github.io/scikit-gstat/auto_examples/tutorial_06_gstools.html

Hope this helps!

Thank you for such a detailed answer. But, unfortunately, I can't interpolate 3D with either skgstat or gstools. In the case of skgstat, when calling the OrdinaryKriging.transform() method, it becomes impossible to allocate memory for an array of distances for a new grid size (127x127x100). And in the case of gstools, calling the constructor of the gs.krige class.Ordinary takes about an hour for one field.

Can I share my code with you to figure out where I'm wrong? Of course, if you have the time. Thank you in advance for the answer!

@JorjeousJorje yeah go ahead and post the code here. However, depending on your hardware (memory allocation is a hardware limitation), the sample size, and the fact that you are interpolating ~1.6M points, 60min doesn't seem too crazy to me.

A workaround for skgstat (and possibly also gstools) would be to work in batches. ie. interpolate 100x 127x127 grids and np.stack(batches, axis=2) the batches together again. Be aware, that you use the full sample and the same Variogram/CovModel for all layers. The target cells can be calculated in parallel, as they are only dependent on the sample.
Both packages can run the 127x127 interpolation in parallel as well. For OrdinaryKriging, there is the n_jobs attribute, which can be set to something larger than 1 (4 or 8 possibly). I remember, that there was an issue though in the past, not sure...

Hey there,
GSTools provides chunking out of the box. See here

from gstools import Gaussian, krige

# condtions
cond_pos = ... # your condition position tuple (x,y,z)
cond_val = ... # your condition values
# resulting grid
gridx = ... # your structured output 3d grid tuple given by (x,y,z) axes
# ordinary kriging setup
model = Gaussian(dim=3, var=0.5, len_scale=2)
krig = krige.Ordinary(model, cond_pos=cond_pos, cond_val=cond_val)
krig.structured(grid, chunk_size=10000) # you can play around with "chunk_size"

I don't know how many condition points you have, but it could also help to not use the pseudo-inverse to build up the kriging matrix and try the actual inverse if it is possible. This could speed up the kriging setup:

krig = krige.Ordinary(model, cond_pos=cond_pos, cond_val=cond_val, pseudo_inv=False)

Hope that helps,
Sebastian

@mmaelicke
I have 16 GB of ram, but trying to allocate 194 :D

Here is my code:

import numpy as np
import skgstat as skg

from ImagesDataset import *
from utility import *


mode = 0
rootdir=f"S{mode}_Uz"
dataset = ImagesDataset(rootdir=rootdir, transforms=None)
s = dataset[:]
# 's' is an array of shape (Nx127x127), where N is number of 2D fields.
mean_velocity = calculate_mean_velocity(s)
# 'mean_velocity' is a sum of all N fields divided by N

coords_path = "S0_FJ_10D_up0p33D_Re10000_3p5kHz_set1_inst_0001.txt"
df = pd.read_csv(coords_path, sep="\t")
x = df["x[mm]"].values
# 'x' array is: [-24.8482  -24.44322 -24.03825 ...  25.36891  25.77388  26.17886]
# 'x' shape is (16129,)
y = df["y[mm]"].values
# 'y' array is: [-24.69589 -24.69589 -24.69589 ...  23.94076  23.94076  23.94076]
# 'y' shape is (16129,)
z = generate_z_coordinate(mean_velocity, s, dt=1. / 3500.)
# 'z' is an array of shape (Nx127x127), which holds irregular grid z coordinate for each 2D field.

z_ = z[0].ravel()
# 'z_' shape is (16129,)
coords = np.column_stack((x, y, z_))
# 'coords' shape is (16129, 3)
values = s[0].ravel()
# 'values' shape is (16129,)

# initializing variogram
V = skg.Variogram(coords, values, maxlag="median", 
                  n_lags=21, model='gaussian', use_nugget = False)

ok = skg.OrdinaryKriging(V, min_points=5, max_points=10, mode='exact', n_jobs=1)

# defining of a new regular grid
xx, yy, zz = np.mgrid[x.min():x.max():127j, y.min():y.max():127j, z.min():z.max():100j]
field = ok.transform(xx.flatten(), yy.flatten(), zz.ravel())

as a result of that I have memory error:
image

And talking about n_jobs attribute in Ordinary Kriging: when n_jobs> 1, I have this error:
image

@MuellerSeb Thank you, I will try chunking!

Thanks for help!

Thanks, @MuellerSeb your insights are, as always, very welcome.

If you want to stick to skgstat, I guess the point here is the sample size. The Variogram has to calculate a distance matrix of about ~ 16129**2 / 2. That takes time. There are two approaches here:

  1. as already described, break down the xx, yy, zz into smaller pieces and calculate one at a time
  2. and additionally you can consider to sub-sample your sample. You can do that by hand using numpy, but we also have a replacement for MetricSpace which does that. Unfortunately, it's largely undocumented:
    class ProbabalisticMetricSpace(MetricSpace):

    Nevertheless, you can check out the MetricSpace docs.
  3. You can do both, of course.

GSTools has that probabilistic approach implemented as well. As performance matters here, I would suggest that you use gstools anyway. While the Variogram is of sufficient speed, the Kriging (in skgstat) is not. GSTools is much faster here. You can also pass down arguments using the interface if you want to stick to skgstat for the variography part:

krige = V.to_gs_krige(pseudo_inv=False)
krige.structured(grid, chunk_size=10000)

Yeah, the pickle AttributeError is what I came across some time ago myself. I think we re-indroduced this error when @redhog made some substantial changed to the Kriging algorithm. I'll have to check that out, again.

Got the same error for a 2D grid... maybe an easy way out is using CloudPickle or joblib pickle/dump?

Hey @srggrs, @JorjeousJorje,
Yeah, the bug is independent of the target grid dimensionality.
We broke the pickling of the multiprocessing version of Kriging during the last rounds of code rework. I pushed a quick and dirty fix, by merging in an old version of kriging, which did use joblib. It is available on the use-joblib branch. As far as I can tell, this should fix the bug. However, from my feeling, the code is not as fast as it should be. I can't really see an improvement in runtime. I am not sure, if my sample here is not large enough or if there is an implementation error. I guess the latter.
Maybe you want to check the changes. I need to systematically test this new version, as it could potentially break existing code. That may take a little bit.

The work-in-progress branch can be checked out like:

git clone git@github-com:mmaelicke/scikit-gstat
cd scikit-gstat
git checkout use-joblib
pip install -e .

Still, the Kriging code needs some rework, especially the multi-processing part. I won't invest too much time, as gstools has a Kriging implementation that is way cleaner and faster. Hence there is no high priority on this issue. Contributions are always welcome.

Best,
Mirko

As a short addtition:
An example code how gstools can be used for Kriging is posted above by @MuellerSeb.
If you want to stick to skgstat.Variogram, you can use the GSTools interface.

Something like:

vario = skg.Variogram(...)

krige = vario.to_gs_krige()  # This is a gstools.Krige instance!
krige.structured(grid, chunk_size=10000)

Please note, that gstools is not a dependency of skgstat, thus you need to install that first:

conda install -c conda-forge gstools