stanfordnlp/cocoa

Modular approach question

vtsom opened this issue · 1 comments

vtsom commented

Hi,

I am trying to reproduce the SL/RL(act) model and the overall example.

Reading the paper and running the code I noticed you refer to a Hybrid Policy (paper) which is basically the SL(act)+rule and there is also a hybrid type of agents which refers to what I want to reproduce.

When I am going through your code for the Modular approach, I noticed that when RL is applied, you specify the pt-neural type of agents:

mkdir checkpoint/lf2lf-margin; PYTHONPATH=. python reinforce.py --schema-path data/craigslist-schema.json \ --scenarios-path data/train-scenarios.json \ --valid-scenarios-path data/dev-scenarios.json \ --price-tracker price_tracker.pkl \ --agent-checkpoints checkpoint/lf2lf/model_best.pt checkpoint/lf2lf/model_best.pt \ --model-path checkpoint/lf2lf-margin \ --optim adagrad --learning-rate 0.001 \ --agents pt-neural pt-neural \ --report-every 500 --max-turns 20 --num-dialogues 5000 \ --sample --temperature 0.5 --max-length 20 --reward margin

Later, at the End-to-End approach, you mention that in order to run the RL finetune: "We just need to change the agent type to --agents hybrid hybrid".

So, my question is that, shouldn't those two be at the exact opposite side, meaning the Modular approach with hybrid type agents and the End-to-End approach with pt-neural?

I might be also missing something here - something that I haven't understood correctly. I would really appreciate your kind help.

Thank you in advance!

Hi Sorry for the late reply! It was a mistake and I just updated the doc. For RL training, the agent type (which specify the policy) should always be pt-neural otherwise we cannot do backprop.