Optimisation problem in gpflux

Question

Optimisation problem in gpflux

Opened this issue 3 years ago · 1 comments

Hi,

I'm using a two-layer deep GP with Poisson likelihood. With the default zero mean function, there is no problem for step:
history = model.fit({"inputs": x_train, "targets": y_train}, batch_size=batch_size1, epochs= 100, callbacks=callbacks, verbose=0) where we have 100 epochs.

However, when I add a constant mean such as -10, when epochs are greater than 10, it will always result in matrix inversion problem.

Traceback (most recent call last):
File "xx.py", line 464, in
history = model.fit({"inputs": x_train, "targets": y_train}, batch_size=batch_size1, epochs= args.nIter, callbacks=callbacks, verbose=0)
File "xx/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "xx/anaconda3/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 828, in call
result = self._call(*args, **kwds)
File "xx/anaconda3/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 855, in _call
return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable
File "xxxx/python3.7/site-packages/tensorflow/python/eager/function.py", line 2943, in call
filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
File "xx/anaconda3/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1919, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "xxx/anaconda3/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 560, in call
ctx=ctx)
File "xxx/anaconda3/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input matrix is not invertible.
[[node gradient_tape/model/gp_layer_1/triangular_solve/MatrixTriangularSolve (defined at xx.py:464) ]] [Op:__inference_train_function_6603]

Errors may have originated from an input operation.
Input Source operations connected to node gradient_tape/model/gp_layer_1/triangular_solve/MatrixTriangularSolve:
model/gp_layer_1/Cholesky (defined at /gpfs/ts0/home/xx249/anaconda3/lib/python3.7/site-packages/gpflow/conditionals/util.py:56)

Function call stack:
train_function

However, optimisers if they can compute the objective at the start should not fail and crash when they try parameters that fail the objective (they should just keep the current objective and sample somewhere else, as what the Scipy optimiser does.

Just wondered if there is any solution for this?

Many thanks.

Answer 1 · 2021-06-03T09:53:38.000Z

Hi @XiaoyuHy, thanks for your issue. Can you please post a minimal failing example and your development environment (Python, GPflow, GPflux and Tensorflow versions) so we can replicate the bug?