Modular approach question
vtsom opened this issue · 1 comments
Hi,
I am trying to reproduce the SL/RL(act) model and the overall example.
Reading the paper and running the code I noticed you refer to a Hybrid Policy (paper) which is basically the SL(act)+rule and there is also a hybrid
type of agents which refers to what I want to reproduce.
When I am going through your code for the Modular approach, I noticed that when RL is applied, you specify the pt-neural
type of agents:
mkdir checkpoint/lf2lf-margin; PYTHONPATH=. python reinforce.py --schema-path data/craigslist-schema.json \ --scenarios-path data/train-scenarios.json \ --valid-scenarios-path data/dev-scenarios.json \ --price-tracker price_tracker.pkl \ --agent-checkpoints checkpoint/lf2lf/model_best.pt checkpoint/lf2lf/model_best.pt \ --model-path checkpoint/lf2lf-margin \ --optim adagrad --learning-rate 0.001 \ --agents pt-neural pt-neural \ --report-every 500 --max-turns 20 --num-dialogues 5000 \ --sample --temperature 0.5 --max-length 20 --reward margin
Later, at the End-to-End approach, you mention that in order to run the RL finetune: "We just need to change the agent type to --agents hybrid hybrid
".
So, my question is that, shouldn't those two be at the exact opposite side, meaning the Modular approach with hybrid
type agents and the End-to-End approach with pt-neural
?
I might be also missing something here - something that I haven't understood correctly. I would really appreciate your kind help.
Thank you in advance!
Hi Sorry for the late reply! It was a mistake and I just updated the doc. For RL training, the agent type (which specify the policy) should always be pt-neural
otherwise we cannot do backprop.