There is no token_reward in wordle train_bc file
ElegantLin opened this issue · 3 comments
Dear authors,
Thanks for your great job. However, when I try to run the train_bc.py
in wordle file. I got the error that
File "/data2/xxx/Implicit-Language-Q-Learning/src/wordle/load_objects.py", line 99, in load_human_dataset
token_reward = load_item(config['token_reward'], device, verbose=verbose)
KeyError: 'token_reward'
Could you please tell me more about the solution?
I also have another question why we should run train_bc and get the weight first? Why cannot we train iql directly?
Thanks!
Ah apologies for the error!
I believe this issue can be fixed by changing:
model:
transition_weight: 0.0
dataset:
name: wordle_human_dataset
cache_id: d
load:
checkpoint_path: null
strict_load: true
to:
model:
transition_weight: 0.0
dataset:
name: wordle_human_dataset
cache_id: d_train
load:
checkpoint_path: null
strict_load: true
in config/wordle/train_bc.yaml
And as for why run bc first, of course you certainly can train IQL directly. But to run ILQL inference you need a BC model for the IQL value functions to perturb, which is why I recommend training BC first so that you can evaluate ILQL as it is training.
Thank you for the questions! Let me know if there is anything else I can help with!
Hi Charlie,
Thanks for your reply. I am sorry I still meet some issues in the following two files when I tried to run the train_bc.py
and train_iql.py
in the toxicity dataset.
For train_bc.py
, when I run it, it throws the exception that In 'train_bc': Could not find 'evaluator/bc_evaluator'
. However, I found a @register('bc_evaluator')
at https://github.com/Sea-Snell/Implicit-Language-Q-Learning/blob/main/src/load_objects.py#L82. I wonder why it will be like this?
For train_iql.py
, I think it is a bug at https://github.com/Sea-Snell/Implicit-Language-Q-Learning/blob/main/src/toxicity/toxicity_env.py#L53. I think it may be because that RedditData
cannot be random.choice?
The program will be stuck here.
Thanks for your help. I am looking forward to your reply!
Ok I just pushed a fix for your first error. However for the second, I suspect that this is related to your python version; I'm using python 3.9.7.