Reinforcement learning for Recommendation Systems

Some of the code were reused from Catalyst demo notebook and higgsfield's RL-Adventure, it helps a lot.

TODO:

Special TODO:

Model	nDCG@10	hit_rate@10
DDPG with OU noise	0.280	0.502
DDPG	0.254	0.454
Neural Collaborative Filtering	0.238	0.430
Random (for comparison)	~0.05	~0.1

FrankTianTT/Multi-aspect-Reinforcement-Recommendation