Sampled alpha zero

To work on different multi-stage sequential decision processes