THU-BPM/ISESL-SQL

Original Spider Dataset used in the Paper + Setup Instructions to Run on a New Database

Opened this issue · 1 comments

Hey,

I tried following the setup instructions given in README.md but I think because of some changes in the Spider dataset these instructions are no longer valid.

For example,

python3 -u preprocess/process_dataset.py --dataset_path data/train.json --raw_table_path data/tables.json --table_path data/tables.bin --output_path 'data/train.bin' --skip_large --semantic_graph

There is no 'data/train.json' file in the Spider dataset. But it has 'data/train_spider.json' and 'data/train_others.json' files.

I tried changing the file name, but I get the following error.

Firstly, preprocess the original databases ... Traceback (most recent call last): File "/sensei-fs/users/saudi/text2sql/ISESL-SQL/preprocess/process_dataset.py", line 74, in <module> tables = process_tables(processor, tables_list, args.table_path, args.verbose) File "/sensei-fs/users/saudi/text2sql/ISESL-SQL/preprocess/process_dataset.py", line 26, in process_tables tables[each['db_id']] = processor.preprocess_database(each, verbose=verbose) File "/sensei-fs/users/saudi/text2sql/ISESL-SQL/preprocess/common_utils.py", line 100, in preprocess_database c = [w.lemma.lower() for s in doc.sentences for w in s.words] File "/sensei-fs/users/saudi/text2sql/ISESL-SQL/preprocess/common_utils.py", line 100, in <listcomp> c = [w.lemma.lower() for s in doc.sentences for w in s.words] AttributeError: 'NoneType' object has no attribute 'lower'

If possible, please upload the original dataset used in google drive and share the link.

Also, please provide the instructions to run the pipeline on a new database. Like what files need to be created in the data folder and what scripts to use.

Your help on this is much appreciated.

Best,
Saud Iqbal

exlaw commented

Have you resolved this issue? I don't think it's closely related to the dataset. Maybe some words in your lib version does not have lemma. You can try using
[w.lemma_.lower() if w.lemma_ is not None else w.lower() for s in doc.sentences for w in s.words]