There is no token_reward in wordle train_bc file

Question

There is no token_reward in wordle train_bc file

ElegantLin opened this issue 2 years ago · 3 comments

Dear authors,

Thanks for your great job. However, when I try to run the train_bc.py in wordle file. I got the error that

  File "/data2/xxx/Implicit-Language-Q-Learning/src/wordle/load_objects.py", line 99, in load_human_dataset
    token_reward = load_item(config['token_reward'], device, verbose=verbose)
KeyError: 'token_reward'

Could you please tell me more about the solution?

I also have another question why we should run train_bc and get the weight first? Why cannot we train iql directly?

Thanks!

Answer 1 · 2022-10-20T07:56:26.000Z

Ah apologies for the error!

I believe this issue can be fixed by changing:

model:
  transition_weight: 0.0
  dataset:
    name: wordle_human_dataset
    cache_id: d
  load:
    checkpoint_path: null
    strict_load: true

to:

model:
  transition_weight: 0.0
  dataset:
    name: wordle_human_dataset
    cache_id: d_train
  load:
    checkpoint_path: null
    strict_load: true

in config/wordle/train_bc.yaml

And as for why run bc first, of course you certainly can train IQL directly. But to run ILQL inference you need a BC model for the IQL value functions to perturb, which is why I recommend training BC first so that you can evaluate ILQL as it is training.

Thank you for the questions! Let me know if there is anything else I can help with!

Answer 2 · 2022-11-06T07:00:26.000Z

Hi Charlie,

Thanks for your reply. I am sorry I still meet some issues in the following two files when I tried to run the train_bc.py and train_iql.py in the toxicity dataset.

For train_bc.py, when I run it, it throws the exception that In 'train_bc': Could not find 'evaluator/bc_evaluator'. However, I found a @register('bc_evaluator') at https://github.com/Sea-Snell/Implicit-Language-Q-Learning/blob/main/src/load_objects.py#L82. I wonder why it will be like this?

For train_iql.py, I think it is a bug at https://github.com/Sea-Snell/Implicit-Language-Q-Learning/blob/main/src/toxicity/toxicity_env.py#L53. I think it may be because that RedditData cannot be random.choice?
The program will be stuck here.

Thanks for your help. I am looking forward to your reply!

Answer 3 · 2022-11-24T17:50:38.000Z

Ok I just pushed a fix for your first error. However for the second, I suspect that this is related to your python version; I'm using python 3.9.7.