Reinforcement-Learning-Branch-and-Bound: A Python repository from Fengyuan-Shi

8/30

Add new feature: estimation of a lower value.

8/17

try randomly secting the domain.
Using the same network, can we generalize f(x) = 0 to f(x) = n?
Sample from the neiborhood of every point.
Compare with branch and prune, even the network doesn't give a fair reward.
a. train f(x) = 0
b. compare B&P f(x) = 0 and B&P+NN f(x) = 0
c. compare B&P f(x) = n and B&P+NN f(x) = n

8/10

converges after finding a fair enough solution.
nn fails to find answer if the domain is changed after training.

8/9

Can't use tilted value cuz the weights for policy and value are shared. When the interval is small, branching intervals will lead to the same action again, until masked. (Because the input for NN is very similar).

8/3/2018

Try eliminating value head.
Try pure MCTS.
Try difficult functions.

7/24/2018

multi dimension issue within sampling data representation
Need unified naming among BB files
Need benchmarks
Passing messages among nodes in a graph: https://arxiv.org/pdf/1704.01212.pdf
Reward calculation. Currently by 1-|value in function| and collect all terminal reward as training example
backtrack

7/13/2018

We need a systemetic guide/ introduction to branch and bound.
We need a systemetic guide/ introduction to branch and bound.
How to choose the middle value to cut? Why 0.4 ?
Does relaxation mean making problem easier?
what does subproblem infeasible mean?
Documentation for pyibex.
Representation of the value, can it be all nagative?
Representation of the state.
When it reaches ternimal, how to calculate its reward? Possible approach: calculate the mean give the lower and upper value.

Fengyuan-Shi/Reinforcement-Learning-Branch-and-Bound