Reproduce of Top-K Off-Policy Correction for a REINFORCE Recommender System 1.required environment:tensorflow >= 1.2.1; numpy
2.To see the experiment result of the paper in section 6, please run python simulation.py
3.To verify the algorithm having the ability to recommend top-k items with long-term value, I design an extra experiment, you can run simulation_ltv.