Reproduce-of-Top-K-Off-Policy-Correction-for-a-REINFORCE-Recommender-System

Reproduce of Top-K Off-Policy Correction for a REINFORCE Recommender System 1.required environment:tensorflow >= 1.2.1; numpy

2.To see the experiment result of the paper in section 6, please run python simulation.py image image

3.To verify the algorithm having the ability to recommend top-k items with long-term value, I design an extra experiment, you can run simulation_ltv. image image image