market state during simulation step leaks previous vendor actions
Closed this issue · 1 comments
assume a linear market with two vendors (doupoly, RL vs rule-based). During the reset()
the vendor_actions
of the market will default to:
In the linear market the state simply holds the qualities of both vendors
Now in the first episode and first step of the simulation the rl-agent
will receive his observation of the market, i.e.
The agent picks an action
customers_per_vendor_iteration = self.config.number_of_customers // self._number_of_vendors
.
At first the probability distribution, which defines the purchase behaviour will be generated with prices/actions=
In the second iteration the rule-based agent can choose his action. For doing so one should expect him to get an observation
customers_per_vendor_iteration = self.config.number_of_customers // self._number_of_vendors
for i in range(self._number_of_vendors):
self._simulate_customers(profits, customers_per_vendor_iteration)
if i < len(self.competitors):
action_competitor_i = self.competitors[i].policy(self._observation(i + 1)) # this observation already leaks information
self.vendor_actions[i + 1] = action_competitor_i # during the next iteration we know would simulation customers behaviour a second time with the action from vendor 0...
Ok this can be closed. It seems to be the desired behaviour! I just misunderstood it.