Env Rollout
Closed this issue · 2 comments
Hi,
Thanks for your contribution, and I am interested in this work. I read the code for a while but could not successfully roll out a JSSP instance using this env.
I appreciate it if you can provide a simple code snippet to show how to use this env to rollout an instance with random action.
Best regards,
Cong
Hi Cong,
I read your paper Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning
, and it is an excellent contribution to this CO+RL field.
The simulator's transition function is designed to support two types of MDP transitions explained in section 3.2/3 of Learning to schedule job-shop problems: Representation and policy learning using graph neural network and reinforcement learning
.
The first and second snippets simulate the MDP of 3.2 and 3.3, respectively.
sim = FT06()
T_max = 100000
done = False
while True:
_, _, done = sim.observe()
if done:
break
for m in sim.machine_manager.get_machines():
if m.available():
sim.transit() # assign random action (operation)
_, r, _ = sim.observe()
sim.global_time += 1
sim.machine_manager.do_processing(sim.global_time)
print(sim.global_time)
sim = FT06()
while True:
available_m_ids, aggregated_reward, done = sim.flush_trivial_ops()
if done:
break
for i, m_id in enumerate(available_m_ids):
sim.transit() # assign random action (operation)
print(sim.global_time)
I hope this can answer your question.
Sincerely,
Junyoung
Congratulation on your IJPR paper, nicely done!
Thank you for your codes, they are helpful. I figured out a rollout myself a while ago by reading your code. Here is my rollout fn corresponding to your efficient MDP formulation (section 3.3),
def rollout(s, verbose=True):
s.reset()
done = False
while True:
do_op_dict = s.get_doable_ops_in_dict()
all_machine_work = False if bool(do_op_dict) else True
if all_machine_work: # all machines are on processing. keep process!
s.process_one_time()
else: # some of machine has possibly trivial action. the others not.
_, _, done, _ = s.flush_trivial_ops(reward='makespan') # flush the trivial action
if done:
break # env rollout finish due to trivial action flush
g, r, done = s.observe(return_doable=True)
s.transit()
if done:
break # env rollout finish due to transition
if verbose:
print('All job finished, makespan={}.'.format(s.global_time))
s = Simulator(6, 6, verbose=False)
rollout(s)
which gives the same makespan as your example.
Compared with your example, I found some redundant codes in my function. Your code provides me a better understanding of the environment, thanks a lot!