modified mse for BR
Ohayoosan opened this issue · 9 comments
after fiddling with it #236 (comment) ,
I can't get the value of the weight and bias in the library. Here's my code:
from numpy import genfromtxt
import numpy as np
import neupy.algorithms.gd.lev_marq as lm
from neupy import algorithms, storage
from neupy.layers import *
from neupy.exceptions import StopTraining
from neupy import utils
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
import scipy as sci
import time
import tensorflow as tf
start = time.time()
def on_epoch_end(optimizer):
if optimizer.errors.valid[-1] < 0.01:
raise StopTraining("Training has been interrupted")
dataset = genfromtxt('cuaca_fulldata.csv', delimiter=',')
data = dataset[:,:-1]
target = dataset[:,len(dataset[0])-1]
print(data)
print(target)
N = np.size(data) * 0.75
data_scaler = preprocessing.MinMaxScaler()
data = data_scaler.fit_transform(data)
utils.reproducible()
x_train, x_test, y_train, y_test = train_test_split(
data, target, test_size=0.15
)
def modified_mse(actual, expected, parameters):
# Loss funtion can be modified here
sum_error = 0
sum_weight = []
hessian = lm.setup_parameter_updates()
trace = sci.inv(hessian)
sum_error += sum([(expected[i] - actual[i]) ** 2 for i in range(len(expected))])
sum_weight += sum((parameters) ** 2)
alpha = sum(parameters) / (2 * sum_weight + trace);
gamma = sum(parameters) - (alpha * trace);
beta = abs((N - gamma) / (2 * sum_error));
C = (beta*sum_error) + (alpha*sum_weight)
return tf.reduce_mean(tf.square(actual - expected))
class LMwithBR(algorithms.LevenbergMarquardt):
loss = None
def __init__(self, *args, **kwargs):
self.loss = modified_mse
super(LMwithBR, self).__init__(*args, **kwargs)
n_inputs = 6
n_hidden = 3
n_outputs = 1
network = join(
Input(n_inputs),
Sigmoid(n_hidden),
Linear(n_outputs),
)
optimizer = LMwithBR(
network,
verbose=True,
show_epoch=5,
mu=0.05,
#signals=on_epoch_end,
)
One of the issues, is that in the modified_mse
you have to use tensorflow functions instead of sum
, abs
and scipy
functions. In the code, you're trying to process expected
and actual
inputs is if they are actual values, but in neupy (with current TF versions) everything is a tensor and all the calculation should be done symbolically.
Also, variables could be retrieved from the network.variables
Sorry. can you tell me what references can I learn for this problem? I have opened the tensorflow tutorial, but I can't find any explanation that relates to weight and bias.
As I've mentioned before, you can retrieve weights and biases from network.variables
attribute. Also, you can try to use tf.global_variables
function: https://www.tensorflow.org/api_docs/python/tf/global_variables . It returns all variables that has been defined within current tensorflow graph.
In addition, old NeuPy versions have LM algorithm implemented in numpy. You can check old code (v0.1.4) in case numpy's implementation will be easier for you to understand.
Old implementation: https://github.com/itdxer/neupy/blob/v0.1.4/neupy/algorithms/backprop/levenberg_marquardt.py
I hope it helps
can you check my code and how i can get hessian matrix value from LM using tensorflow ?
def modified_mse(actual, expected, LevenbergMarquardt):
# Loss funtion can be modified here
inv = tf.matrix_inverse()
trace = tf.trace(inv)
weight = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
numP = 13
alpha = numP / (2 * weight + trace);
gamma = numP - (alpha * trace);
beta = tf.math.abs((N - gamma) / (2 * error_func));
return tf.reduce_mean(tf.square(actual - expected))
It's not so simple to do it in tensorflow. Here is how it's done in other algorithm: https://github.com/itdxer/neupy/blob/master/neupy/algorithms/gd/hessian.py#L16-L52. Also, I believe that for the LM algorithm, hessian is calculated using jacobians, since it's a special case for MSE loss. You might try to modify code from the old implementation that uses numpy: https://github.com/itdxer/neupy/blob/v0.1.4/neupy/algorithms/backprop/levenberg_marquardt.py
Thanks. I'll try it.
after I tried to enter the formula into the algorithm, the algorithm worked but the results were far from what was expected when compared to Matlab. Here's the code:
from operator import mul
import numpy as np
from numpy import zeros, asmatrix, identity, tile, dot, concatenate
from neupy.core.properties import NonNegativeNumberProperty
from neupy.functions import mse
from neupy.algorithms import Backpropagation
__all__ = ('LevenbergMarquardt',)
class LevenbergMarquardt(Backpropagation):
""" Levenberg-Marquardt algorithm. """
C = 0
mu = NonNegativeNumberProperty(default=0.01)
mu_increase_factor = NonNegativeNumberProperty(default=5, min_size=1)
def setup_defaults(self):
super(LevenbergMarquardt, self).setup_defaults()
# Can use only squared error
del self.error
self.error = mse # `error` isn't a property instance after deletion.
def init_layers(self):
super(LevenbergMarquardt, self).init_layers()
self.n_weights = sum(mul(*layer.size) for layer in self.train_layers)
def get_jacobian_matrix(self):
result = []
for input_data, gradient in zip(self.layer_outputs, self.delta):
gradient_n_col = gradient.shape[1]
input_data_n_features = input_data.shape[1]
repeated_gradients = tile(gradient, (1, input_data_n_features))
# Total weight matrix size (rows*cols) in layer.
n_weights = repeated_gradients.shape[1]
# Matrix `comb` help repeate input data columns for valid
# combination of input data and gradients for Jacobian matrix.
# Basicly this matrix and dot product replace 2 inner loops
# in pure python.
comb = zeros((input_data_n_features, n_weights))
for i in range(input_data_n_features):
comb[i, i * gradient_n_col:(i + 1) * gradient_n_col] = 1
result.append(repeated_gradients * dot(input_data, comb))
return concatenate(result, axis=1)
def update_weights(self, weight_deltas):
error = asmatrix(self.target_train - self.output_train)
jacoby = self.get_jacobian_matrix() / error
jacoby_t = jacoby.T
smallstep_matrix = self.mu * identity(self.n_weights)
inverse_hessian = (jacoby_t * jacoby + smallstep_matrix).I
# Most of all time this order would be much faster than sequantial
# order without brackets, because number of columns for parameter
# error would be small.
jacoby_delta = inverse_hessian * (jacoby_t * error)
trace = np.trace(inverse_hessian)#new
Ed = np.sum(np.asarray(error) ** 2)#new
row = 0
old_weights = self.old_weights = []
Ew = 0#new
for layer in self.train_layers:
weight = layer.weight
weight_in_size, weight_out_size = weight.shape
old_weights.append(weight.copy())
for i in range(self.use_bias, weight_in_size):
weight_delta = jacoby_delta[row:row + weight_out_size, :].T
weight[i:i + 1, :] -= weight_delta
row += weight_out_size
if self.use_bias:
# Use otside if loop because we setup last column for bias
# which is at fist row position in weight matrix.
weight[0:1, :] -= jacoby_delta[row:row + weight_out_size, :].T
row += weight_out_size
pr = row#new
Ew = np.sum((row) ** 2)#new
alp = pr / (2 * Ew + trace)#new
gamma = pr - (alp * trace)#new
beta = np.abs((7349 - gamma) / (2 * Ed))#new
C = (alp * Ew) + (beta * Ed)#new
def after_weight_update(self, input_train, target_train):
super(LevenbergMarquardt, self).after_weight_update(
input_train, target_train
)
if not self.last_error():
return
output_train = self.predict(input_train)
error = self.error(output_train, target_train) +self.C#new
if error < self.last_error_in():
self.mu /= self.mu_increase_factor
return
self.mu *= self.mu_increase_factor
# Error changed in wrong way. Rollback weight updates.
for i, layer in enumerate(self.train_layers):
layer.weight = self.old_weights[i]
Result
Epoch 100
- Train error: 9.789623386379368
- Validation error: 9.893661186835752
- Epoch time: 0.02895 sec
[WARN] Too many outputs in a terminal. Set up logging after each 40 epoch
[TRAIN] End train
Target [3. 1. 1. ... 1. 1. 0.]
Prediksi [[3.41150728]
[4.06135076]
[4.03813668]
...
[4.04405066]
[4.07661428]
[3.90400968]]
Here's the step of algorithm :
- Compute the Jacobian (by finite differences or using the chain rule)
- Compute the error gradient
g = JtE - Approximate the Hessian using the cross product Jacobian
H = JtJ - Calculate the cost function
C = βEd+αEw, where:
Ed is the sum of squared errors, and
Ew is the sum of squared weights - Solve (H + λI)δ = g to find δ
- Update the network weights w using δ
- Recalculate the cost function using the updated weights
- If the cost has not decreased,
Discard the new weights, increase λ using v and go to step 5. - Else decrease λ using v
- Update the Bayesian hyperparameters using MacKay’s or Poland’s formulae:
gamma = W – (alpha * tr(H-1))
beta = (N – gamma) / 2.0 * Ed
alpha = W / (2.0 * Ew + tr(H-1)) [modified Poland’s update], or
alpha = gamma / (2.0 * Ew) [original MacKay’s update], where:
W is the number of network parameters (number of weights and biases)
N is the number of entries in the training set
tr(H-1) is the trace of the inverse Hessian matrix
I'm sorry, It's quite hard to debug it and I don't have time to do it. I can suggest you do step by step manual calculation and compare it with outputs produced by the algorithm (make sure that parameters aren't random). Simpler method will be to validate certain properties of the algorithm after each update. I hope it helps
I'm closing this issue for now, but in case you get some follow up questions, feel free to open a new ticket for it