modified mse for BR

Question

modified mse for BR

Ohayoosan opened this issue 5 years ago · 9 comments

after fiddling with it #236 (comment) ,
I can't get the value of the weight and bias in the library. Here's my code:

from numpy import genfromtxt
import numpy as np
import neupy.algorithms.gd.lev_marq as lm
from neupy import algorithms, storage
from neupy.layers import *
from neupy.exceptions import StopTraining
from neupy import utils
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
import scipy as sci
import time
import tensorflow as tf

start = time.time()

def on_epoch_end(optimizer):
    if optimizer.errors.valid[-1] < 0.01:
        raise StopTraining("Training has been interrupted")

dataset = genfromtxt('cuaca_fulldata.csv', delimiter=',')
data = dataset[:,:-1]
target = dataset[:,len(dataset[0])-1]
print(data)
print(target)
N = np.size(data) * 0.75
data_scaler = preprocessing.MinMaxScaler()

data = data_scaler.fit_transform(data)

utils.reproducible()

x_train, x_test, y_train, y_test = train_test_split(
    data, target, test_size=0.15
)

def modified_mse(actual, expected, parameters):
    # Loss funtion can be modified here
    sum_error = 0
    sum_weight = []
    hessian  = lm.setup_parameter_updates()
    trace = sci.inv(hessian)
    sum_error += sum([(expected[i] - actual[i]) ** 2 for i in range(len(expected))])
    sum_weight += sum((parameters) ** 2)
    alpha = sum(parameters) / (2 * sum_weight + trace);
    gamma = sum(parameters) - (alpha * trace);
    beta = abs((N - gamma) / (2 * sum_error));
    C = (beta*sum_error) + (alpha*sum_weight)
    return tf.reduce_mean(tf.square(actual - expected))


class LMwithBR(algorithms.LevenbergMarquardt):
    loss = None

    def __init__(self, *args, **kwargs):
        self.loss = modified_mse
        super(LMwithBR, self).__init__(*args, **kwargs)

n_inputs = 6
n_hidden = 3
n_outputs = 1

network = join(
    Input(n_inputs),
    Sigmoid(n_hidden),
    Linear(n_outputs),
)

optimizer = LMwithBR(
    network,
    verbose=True,
    show_epoch=5,
    mu=0.05,
    #signals=on_epoch_end,
)

Answer 1 · 2019-03-31T10:15:48.000Z

One of the issues, is that in the modified_mse you have to use tensorflow functions instead of sum, abs and scipy functions. In the code, you're trying to process expected and actual inputs is if they are actual values, but in neupy (with current TF versions) everything is a tensor and all the calculation should be done symbolically.

Also, variables could be retrieved from the network.variables

Answer 2 · 2019-03-31T11:15:59.000Z

Sorry. can you tell me what references can I learn for this problem? I have opened the tensorflow tutorial, but I can't find any explanation that relates to weight and bias.

Answer 3 · 2019-03-31T18:32:57.000Z

As I've mentioned before, you can retrieve weights and biases from network.variables attribute. Also, you can try to use tf.global_variables function: https://www.tensorflow.org/api_docs/python/tf/global_variables . It returns all variables that has been defined within current tensorflow graph.

In addition, old NeuPy versions have LM algorithm implemented in numpy. You can check old code (v0.1.4) in case numpy's implementation will be easier for you to understand.

Old implementation: https://github.com/itdxer/neupy/blob/v0.1.4/neupy/algorithms/backprop/levenberg_marquardt.py

I hope it helps

Answer 4 · 2019-04-01T12:46:02.000Z

can you check my code and how i can get hessian matrix value from LM using tensorflow ?

def modified_mse(actual, expected, LevenbergMarquardt):
    # Loss funtion can be modified here
    inv = tf.matrix_inverse()
    trace = tf.trace(inv)
    weight = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
    numP = 13
    alpha = numP / (2 * weight + trace);
    gamma = numP - (alpha * trace);
    beta = tf.math.abs((N - gamma) / (2 * error_func));
    return tf.reduce_mean(tf.square(actual - expected))

Answer 5 · 2019-04-02T11:12:05.000Z

It's not so simple to do it in tensorflow. Here is how it's done in other algorithm: https://github.com/itdxer/neupy/blob/master/neupy/algorithms/gd/hessian.py#L16-L52. Also, I believe that for the LM algorithm, hessian is calculated using jacobians, since it's a special case for MSE loss. You might try to modify code from the old implementation that uses numpy: https://github.com/itdxer/neupy/blob/v0.1.4/neupy/algorithms/backprop/levenberg_marquardt.py

Answer 6 · 2019-04-02T12:01:50.000Z

Thanks. I'll try it.

Answer 7 · 2019-04-03T03:40:32.000Z

after I tried to enter the formula into the algorithm, the algorithm worked but the results were far from what was expected when compared to Matlab. Here's the code:

from operator import mul
import numpy as np
from numpy import zeros, asmatrix, identity, tile, dot, concatenate

from neupy.core.properties import NonNegativeNumberProperty
from neupy.functions import mse
from neupy.algorithms import Backpropagation


__all__ = ('LevenbergMarquardt',)


class LevenbergMarquardt(Backpropagation):
    """ Levenberg-Marquardt algorithm. """
    C = 0
    mu = NonNegativeNumberProperty(default=0.01)
    mu_increase_factor = NonNegativeNumberProperty(default=5, min_size=1)

    def setup_defaults(self):
        super(LevenbergMarquardt, self).setup_defaults()
        # Can use only squared error
        del self.error
        self.error = mse  # `error` isn't a property instance after deletion.

    def init_layers(self):
        super(LevenbergMarquardt, self).init_layers()
        self.n_weights = sum(mul(*layer.size) for layer in self.train_layers)

    def get_jacobian_matrix(self):
        result = []
        for input_data, gradient in zip(self.layer_outputs, self.delta):
            gradient_n_col = gradient.shape[1]
            input_data_n_features = input_data.shape[1]

            repeated_gradients = tile(gradient, (1, input_data_n_features))
            # Total weight matrix size (rows*cols) in layer.
            n_weights = repeated_gradients.shape[1]

            # Matrix `comb` help repeate input data columns for valid
            # combination of input data and gradients for Jacobian matrix.
            # Basicly this matrix and dot product replace 2 inner loops
            # in pure python.
            comb = zeros((input_data_n_features, n_weights))
            for i in range(input_data_n_features):
                comb[i, i * gradient_n_col:(i + 1) * gradient_n_col] = 1

            result.append(repeated_gradients * dot(input_data, comb))

        return concatenate(result, axis=1)

    def update_weights(self, weight_deltas):
        error = asmatrix(self.target_train - self.output_train)
        jacoby = self.get_jacobian_matrix() / error
        jacoby_t = jacoby.T
        smallstep_matrix = self.mu * identity(self.n_weights)

        inverse_hessian = (jacoby_t * jacoby + smallstep_matrix).I
        # Most of all time this order would be much faster than sequantial
        # order without brackets, because number of columns for parameter
        # error would be small.
        jacoby_delta = inverse_hessian * (jacoby_t * error)

        trace = np.trace(inverse_hessian)#new
        Ed = np.sum(np.asarray(error) ** 2)#new

        row = 0
        old_weights = self.old_weights = []
        Ew = 0#new

        for layer in self.train_layers:
            weight = layer.weight
            weight_in_size, weight_out_size = weight.shape
            old_weights.append(weight.copy())

            for i in range(self.use_bias, weight_in_size):
                weight_delta = jacoby_delta[row:row + weight_out_size, :].T
                weight[i:i + 1, :] -= weight_delta
                row += weight_out_size

            if self.use_bias:
                # Use otside if loop because we setup last column for bias
                # which is at fist row position in weight matrix.
                weight[0:1, :] -= jacoby_delta[row:row + weight_out_size, :].T
                row += weight_out_size

        pr = row#new
        Ew = np.sum((row) ** 2)#new
        alp = pr / (2 * Ew + trace)#new
        gamma = pr - (alp * trace)#new
        beta = np.abs((7349 - gamma) / (2 * Ed))#new

        C = (alp * Ew) + (beta * Ed)#new

    def after_weight_update(self, input_train, target_train):
        super(LevenbergMarquardt, self).after_weight_update(
            input_train, target_train
        )

        if not self.last_error():
            return

        output_train = self.predict(input_train)
        error = self.error(output_train, target_train) +self.C#new

        if error < self.last_error_in():
            self.mu /= self.mu_increase_factor
            return

        self.mu *= self.mu_increase_factor

        # Error changed in wrong way. Rollback weight updates.
        for i, layer in enumerate(self.train_layers):
            layer.weight = self.old_weights[i]

Result
Epoch 100

Train error: 9.789623386379368
Validation error: 9.893661186835752
Epoch time: 0.02895 sec
[WARN] Too many outputs in a terminal. Set up logging after each 40 epoch
[TRAIN] End train
Target [3. 1. 1. ... 1. 1. 0.]
Prediksi [[3.41150728]
[4.06135076]
[4.03813668]
...
[4.04405066]
[4.07661428]
[3.90400968]]

Here's the step of algorithm :

Compute the Jacobian (by finite differences or using the chain rule)
Compute the error gradient
g = JtE
Approximate the Hessian using the cross product Jacobian
H = JtJ
Calculate the cost function
C = βEd+αEw, where:
Ed is the sum of squared errors, and
Ew is the sum of squared weights
Solve (H + λI)δ = g to find δ
Update the network weights w using δ
Recalculate the cost function using the updated weights
If the cost has not decreased,
Discard the new weights, increase λ using v and go to step 5.
Else decrease λ using v
Update the Bayesian hyperparameters using MacKay’s or Poland’s formulae:
gamma = W – (alpha * tr(H-1))
beta = (N – gamma) / 2.0 * Ed
alpha = W / (2.0 * Ew + tr(H-1)) [modified Poland’s update], or
alpha = gamma / (2.0 * Ew) [original MacKay’s update], where:
W is the number of network parameters (number of weights and biases)
N is the number of entries in the training set
tr(H-1) is the trace of the inverse Hessian matrix

Answer 8 · 2019-04-03T08:31:15.000Z

I'm sorry, It's quite hard to debug it and I don't have time to do it. I can suggest you do step by step manual calculation and compare it with outputs produced by the algorithm (make sure that parameters aren't random). Simpler method will be to validate certain properties of the algorithm after each update. I hope it helps

Answer 9 · 2019-04-10T09:37:54.000Z

I'm closing this issue for now, but in case you get some follow up questions, feel free to open a new ticket for it