Env Rollout

Question

Env Rollout

Closed this issue 3 years ago · 2 comments

Hi,

Thanks for your contribution, and I am interested in this work. I read the code for a while but could not successfully roll out a JSSP instance using this env.

I appreciate it if you can provide a simple code snippet to show how to use this env to rollout an instance with random action.

Best regards,
Cong

Answer 1 · 2022-01-07T02:07:01.000Z

Hi Cong,

I read your paper Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning, and it is an excellent contribution to this CO+RL field.

The simulator's transition function is designed to support two types of MDP transitions explained in section 3.2/3 of Learning to schedule job-shop problems: Representation and policy learning using graph neural network and reinforcement learning .

The first and second snippets simulate the MDP of 3.2 and 3.3, respectively.

sim = FT06()
T_max = 100000

done = False
while True:
    _, _, done = sim.observe()
    if done:
        break
    
    for m in sim.machine_manager.get_machines():
        if m.available():
            sim.transit() # assign random action (operation)
            _, r, _ = sim.observe()

    sim.global_time += 1
    sim.machine_manager.do_processing(sim.global_time)
print(sim.global_time)

sim = FT06()
while True:
    available_m_ids, aggregated_reward, done = sim.flush_trivial_ops()
    if done:
        break
    
    for i, m_id in enumerate(available_m_ids):
        sim.transit() # assign random action (operation)
    
print(sim.global_time)

I hope this can answer your question.

Sincerely,
Junyoung

Answer 2 · 2022-01-07T02:31:16.000Z

Congratulation on your IJPR paper, nicely done!

Thank you for your codes, they are helpful. I figured out a rollout myself a while ago by reading your code. Here is my rollout fn corresponding to your efficient MDP formulation (section 3.3),

def rollout(s, verbose=True):
    s.reset()
    done = False
    while True:
        do_op_dict = s.get_doable_ops_in_dict()
        all_machine_work = False if bool(do_op_dict) else True

        if all_machine_work:  # all machines are on processing. keep process!
            s.process_one_time()
        else:  # some of machine has possibly trivial action. the others not.
            _, _, done, _ = s.flush_trivial_ops(reward='makespan')  # flush the trivial action
            if done:
                break  # env rollout finish due to trivial action flush
            g, r, done = s.observe(return_doable=True)
            s.transit()
        if done:
            break  # env rollout finish due to transition
    if verbose:
        print('All job finished, makespan={}.'.format(s.global_time))

s = Simulator(6, 6, verbose=False)
rollout(s)

which gives the same makespan as your example.

Compared with your example, I found some redundant codes in my function. Your code provides me a better understanding of the environment, thanks a lot!