/Optimal-bidding-policy-using-Policy-Gradient-in-a-Multi-agent-Contextual-Bandit-setting

We use policy gradient to help agents learn optimal policies in a competitive multi-agent contextual bandit setting

Primary LanguageJupyter Notebook

No issues in this repository yet.