Code which simulates the automata for creating dialogues based on wikidata
- Run repair-scripts/automata_indir_repair.py
- Run post-proc/filter_json_field2.py (for parellelism, final_dataset/filter_json_field2_set.py could be used alternatively
Errata: automata_simple_ques.py is no longer maintained. Therefore, don't use it.
mkdir test/
mkdir train/
mkdir regex_save_dir/
cp counter_pickle/* test/
python automata.py --save_dir_id SAVE_DIR_ID --update_counter True --sync_counter True --mode test --entity_thresh 10 --triple_thresh 5 --out_dir test/
Here, SAVE_DIR_ID
is an integer for identifying the directory containing dialog JSONs generated by a single job. In practice, we run multiple such commmands in parallel (e.g. SAVE_DIR_ID=1-30)
python dump_regex_ques_wise.py test/ regex_save_dir/
python automata.py --save_dir_id SAVE_DIR_ID --update_counter True --sync_counter True --mode train --entity_thresh 150 --triple_thresh 50 --use_regex --regex_dir regex_save_dir/ --out_dir train/
Here, SAVE_DIR_ID
is an integer for identifying the directory containing dialog JSONs generated by a single job. In practice, we run multiple such commmands in parallel (e.g. SAVE_DIR_ID=1-100)
split train set generated in Step 4 into (train+val+easy_test) in desired ratio.