amritasaha1812/CSQA_Code

Training without pickle dump files

Closed this issue · 1 comments

Hi @amritasaha1812 , I get runtime errors if I try to run without the train data pickle files I get the following runtime error.

Traceback (most recent call last):
  File "run_model.py", line 455, in <module>
    main()
  File "run_model.py", line 449, in main
    get_dialog_dict(param)
  File "/home/sanyam/notebooks/csqa/read_data.py", line 33, in get_dialog_dict
    ques_type_id = param['ques_type_id']
KeyError: 'ques_type_id'

To reproduce this bug, you can comment the lines which load the existing dictionaries in run_model.py

    # if isinstance(param['train_data_file'], list) and isinstance(param['valid_data_file'], list) and all([os.path.exists(x) for x in param['train_data_file']]) and all([os.path.exists(x) for x in param['valid_data_file']]):
	# print 'dictionary already exists'
    #     sys.stdout.flush()
    # elif isinstance(param['train_data_file'], str) and isinstance(param['valid_data_file'], str) and os.path.exists(param['train_data_file']) and os.path.exists(param['valid_data_file']):# and os.path.exists(param['test_data_file']):
    #     print 'dictionary already exists'
    #     sys.stdout.flush()
    # else:
    get_dialog_dict(param)
    print 'dictionary formed'
    sys.stdout.flush()
    run_training(param)

The way the code workflow is, you have removed the first 2 if-else conditions, now it tries to access 'ques_type_id' in param dict which is not present. But this is expected since at training time, we train for all questions, so we don't specify the ques_type_id.
The corr. cmd is python run_model.py <DUMP_DIR> where we dont specify the ques. id. The function get_params() in the script params.py also doesn't store this