KeyError: 'scores' in judgelm_preprocess.py
ilyes-c opened this issue · 0 comments
ilyes-c commented
When running the judgelm_preprocess.py script, I encountered a KeyError: 'scores'
error. It appears that the script expects a scores
field in the JSON files, but the documentation does not mention that this field is required. Here are the details:
Steps to Reproduce:
- Clone the repository:
git clone https://github.com/baaivision/JudgeLM
- Navigate to the directory:
cd JudgeLM
- Create and activate a conda environment.
- Install the required dependencies:
pip install -r requirements.txt
- Run the preprocessing script with the following command:
python C:/Users/mliki/JudgeLM/judgelm/data/JudgeLM/judgelm_preprocess.py --ans1_file_path C:/Users/mliki/JudgeLM/judgelm/data/JudgeLM/answers/gpt-4o_judgelm_val.jsonl --ans2_file_path C:/Users/mliki/JudgeLM/judgelm/data/JudgeLM/answers/gpt-4_judgelm_val.jsonl
Error Message:
Traceback (most recent call last):
File "C:/Users/mliki/JudgeLM/judgelm/data/JudgeLM/judgelm_preprocess.py", line 95, in <module>
combine_judgelm_val_judge_samples(args.ans1_file_path, args.ans2_file_path, args.ansmore_file_paths)
File "C:/Users/mliki/JudgeLM/judgelm/data/JudgeLM/judgelm_preprocess.py", line 18, in combine_judgelm_val_judge_samples
ans1_dict_list = extract_jsonl(ans1_file_path)
File "C:/Users/mliki/JudgeLM/judgelm/utils.py", line 26, in extract_jsonl
data = json.loads(line)
File "C:/Users/mliki/anaconda3/envs/judgelm/lib/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "C:/Users/mliki/anaconda3/envs/judgelm/lib/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:/Users/mliki/anaconda3/envs/judgelm/lib/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/mliki/JudgeLM/judgelm/data/JudgeLM/judgelm_preprocess.py", line 36, in combine_judgelm_val_judge_samples
'score': [ans1_dict['scores'], ans2_dict['scores']],
KeyError: 'scores'
Suggested Fix:
- Either update the documentation to specify that the
scores
field is required in the JSON files. - Or, modify the script to handle cases where the
scores
field is not present.
I have made a temporary modification to the script to handle missing scores
fields by using default values. Here is the updated function:
sample_dict = {
'question_id': i,
'score': [ans1_dict.get('scores', []), ans2_dict.get('scores', [])],
'question_body': question_body,
'answer1_body': ans1_dict['text'],
'answer2_body': ans2_dict['text'],
'answer1_model_id': ans1_dict['model'],
'answer2_model_id': ans2_dict['model'],
'answer1_metadata': {
'decoding_method': ans1_dict.get('decoding_method', ''),
},
'answer2_metadata': {
'decoding_method': ans2_dict.get('decoding_method', ''),
}
}
Please let me know if there are any other suggestions or if I should make further modifications. Thank you!