bgrimstad/splinter

Knot placement under knot multiplicity

dential opened this issue · 1 comments

I am trying to build splines with discontinuities by increasing the multiplicity of the knot at the discontinuity. My understanding is that this can be achieved through splinter's support for custom knots; namely, using the two-step procedure described in #76 and #72.

However, as soon as I move away from using the bspline_interpolator() builder to building the spline from parameters using BSpline.from_param(), I get oscillations (as illustrated below).

Oscillations occur, even when I don't introduce knot multiplicity in the interior of the knot vector, i.e. when just trying to replicate bspline_interpolator() using BSpline.from_param():

import numpy as np
import matplotlib.pyplot as plt
import sys, os
import splinterpy
# splinterpy.load( PATH_HERE )

def g(x):
    if x <= 0.5:
        return 1
    else:
        return np.exp(-10*(x - 0.5))

gridsize = 51
grid = np.linspace(0, 1, gridsize)

# Define knots
knots = np.sort(np.append(grid, [0, 0, 0, 1, 1, 1]))

# Sample from function
y_data = np.zeros(gridsize)
for x_index, x in enumerate(grid):
    y_data[x_index] = g(x)

# Build splines
# Using bspline_interpolator
builder_spline = splinterpy.bspline_interpolator(grid, y_data, degree=3)
# Using BSpline.from_param
degrees = 3
knots = knots.tolist()
dim_y = 1
spline = splinterpy.BSpline.from_param(degrees, knots, dim_y)
fitted_spline = spline.fit(grid, y_data)

# Evaluate splines at fine grid
fine_grid = np.linspace(0, 1, 1001)
builder_spline_eval = builder_spline.eval(fine_grid)
fitted_spline_eval = fitted_spline.eval(fine_grid)

fig_builder_vs_param

Examining the knots of the two splines shows that they differ in that bspline_interpolator() has "removed" the second and second-to-last elements (0.02 and 0.98, respectively):

print(builder_spline.get_knot_vectors())
# [[0.0, 0.0, 0.0, 0.0, 0.04, 0.06000000000000001, 0.08, 0.1, 0.12, 0.13999999999999999, 0.16, 0.18, 0.2, 0.22000000000000003, 0.24000000000000005, 0.26, 0.28, 0.30000000000000004, 0.32, 0.33999999999999997, 0.36, 0.38, 0.4, 0.42000000000000004, 0.44000000000000006, 0.45999999999999996, 0.48, 0.5, 0.52, 0.54, 0.56, 0.5800000000000001, 0.6000000000000001, 0.62, 0.64, 0.66, 0.68, 0.7, 0.72, 0.74, 0.76, 0.78, 0.8, 0.8200000000000001, 0.8400000000000001, 0.86, 0.8800000000000001, 0.9, 0.9199999999999999, 0.9400000000000001, 0.9600000000000002, 1.0, 1.0, 1.0, 1.0]]
print(fitted_spline.get_knot_vectors())
# [[0.0, 0.0, 0.0, 0.0, 0.02, 0.04, 0.06, 0.08, 0.1, 0.12, 0.14, 0.16, 0.18, 0.2, 0.22, 0.24, 0.26, 0.28, 0.3, 0.32, 0.34, 0.36, 0.38, 0.4, 0.42, 0.44, 0.46, 0.48, 0.5, 0.52, 0.54, 0.56, 0.58, 0.6, 0.62, 0.64, 0.66, 0.68, 0.7000000000000001, 0.72, 0.74, 0.76, 0.78, 0.8, 0.8200000000000001, 0.84, 0.86, 0.88, 0.9, 0.92, 0.9400000000000001, 0.96, 0.98, 1.0, 1.0, 1.0, 1.0]]

Removing these two elements from the knot vector fed into BSpline.from_param() makes the two splines identical:

knots = np.sort(np.append(grid, [0, 0, 0, 1, 1, 1]))
knots = knots[(knots != 0.02) & (knots != 0.98)]

fig_builder_vs_param

Introducing multiple knots at the discontinuity (0.5) leads to oscillations around the discontinuity...

knots = np.sort(np.append(grid, [0, 0, 0, 0.5, 0.5, 1, 1, 1]))
knots = knots[(knots != 0.02) & (knots != 0.98)]

fig_1

... unless the knots surrounding the discontinuity (0.48 and 0.52) are removed:

knots = np.sort(np.append(grid, [0, 0, 0, 0.5, 0.5, 1, 1, 1]))
knots = knots[(knots != 0.02) & (knots != 0.98)]
knots = knots[(knots != 0.48) & (knots != 0.52)]

fig_1

I think I am missing something about knot placement. Why do knots surrounding knot multiplicities need to be "removed" in order to avoid oscillations?

Hi, @dential.

I am currently on parental leave and have not had the opportunity to answer you earlier.

First, thank you for a presenting your question in such a clear way.

The behavior you are observing is related to how the regression problem is solved. Let me try to explain it briefly without oversimplifying. When you call the .bspline_interpolator or .fit method, a linear system of equations is solved for the coefficients of the B-spline. The number of variables is equal to the number of coefficients of the B-spline, which is determined by the knot sequence. The number of equations is equal to the number of data points.

Say you have N data points that you wish to interpolate with a B-spline of degree 3. The .bpline_interpolator method will design a knot sequence which gives N coefficients. For this case it will repeat end knots 4 times and remove two internal knots as you pointed out. With N data points (equations) and N coefficients (variables), the linear system becomes square and a unique solution can be found (under some technical assumptions on linear independence).

If you manipulate the knot sequence, for example by adding knots, the linear system can become underdetermined. This means that there are more variables than equations and there is no longer a unique solution. The added variables increase the flexibility of the spline. As in your example with an augmented knot sequence, the spline can interpolate all points and begin to oscillate. To avoid this you may either reduce the number of knots to give you N coefficients, or put a penalty on the coefficients to encourage them to be close to zero (this technique is called regularization).

If you wish to introduce a discontinuity at a point you have to increase the knot multiplicity, as you already have understood. Combine this with one of the two approaches above to avoid oscillations.

If you use the .from_param and .fit method, you have to specify the knot sequence yourself. Designing a good knot sequence can be difficult and may require some knowledge of the data you have. With .bspline_interpolator, Splinter tries to automatically construct a good knot sequence, but it may be suboptimal for some problems.

Hope that helps! And thank you for using the library.

Kind regards,
Bjarne