Total-RD/pymgrid

In test-time the RLLIB agent works on all the horizon and not only the test-split

Closed this issue · 3 comments

This block of code works but I expected 8 steps during test time and not the overall horizon (24 steps)

# THIS MICROGRID BUILD IS OK
from pymgrid import MicrogridGenerator as mg
from pymgrid.Environments.pymgrid_cspla import MicroGridEnv
from ray.rllib.agents import dqn
env = mg.MicrogridGenerator(nb_microgrid=25)
env.generate_microgrid(verbose=True)
mgi = env.microgrids[1]
mgi.train_test_split()

# THIS AGENT BUID IS OK
default_config=dqn.DEFAULT_CONFIG.copy()
default_config["evaluation_interval"] =   1
default_config["evaluation_num_episodes"]=1
default_config["env_config"] = {"microgrid": mgi}
trainer=dqn.DQNTrainer(env=MicroGridEnv,config=default_config)

# INFERENCE BLOCK
env = MicroGridEnv({'microgrid': mg0})
obs = env.reset(testing=True)
done = False
i=0
while not done:
    action = self.trainer.compute_action(obs)
    obs, reward, done, info = env.step(action)
    i+=1
print(i)#should iterate 8 times and not 24 because I want use it on the test-split

The test-time agent should iterate 8760*0.33 timesteps in this case, you also don't need to train_test_split in the first block, it's done automatically when you initialize the environment

`# THIS MICROGRID BUILD IS OK
from pymgrid import MicrogridGenerator as mg
from pymgrid.Environments.pymgrid_cspla import MicroGridEnv
from ray.rllib.agents import dqn
env = mg.MicrogridGenerator(nb_microgrid=25)
env.generate_microgrid(verbose=True)
mgi = env.microgrids[1]
mgi.train_test_split()

THIS AGENT BUID IS OK

default_config=dqn.DEFAULT_CONFIG.copy()
default_config["evaluation_interval"] = 1
default_config["evaluation_num_episodes"]=1
default_config["env_config"] = {"microgrid": mgi}
trainer=dqn.DQNTrainer(env=MicroGridEnv,config=default_config)

INFERENCE BLOCK

env = MicroGridEnv({'microgrid': mgi})
obs = env.reset(testing=True)
done = False
i=0
while not done:
action = trainer.compute_action(obs)
obs, reward, done, info = env.step(action)
i+=1
print(i)`

I modified your code slightly so I could run it, I got print(i) = 2866, which is expected behavior

Thank you Gonzague. The problem come from my older version. Now everything is ok.