SpaRTQA Dataset

We proposes a question-answering (QA) benchmark for spatial reasoning on natural language text, which contains more realistic spatial phenomena not covered by prior work, and is challenging for state-of-the-art language models (LM). We propose a distant supervision method to improve on this task. Specifically, we design grammar and reasoning rules to automatically generate a spatial description of visual scenes and corresponding QA pairs. Experiments show that further pretraining LMs on these automatically generated data significantly improves LMs' capability on spatial understanding, which in turn helps to better solve two external datasets, bAbI and boolQ. We hope that this work can foster investigations into more sophisticated models for spatial reasoning over text.

Dataset statistic

SpaRTQA has two Human and Auto versions. The human version generated by human in small size and the Auto version is generated automatically by hand-crafted rules and Context Free Grammars.

We generate the train, dev, and test sets up on the same image sets in the NLVR dataset based on the given number, which is 24k for train and 4k for other sets. On average, each story contains 9 sentences (Min:3, Max: 22) and 118 tokens (Min: 66, Max: 274), and each question (on all q types) has 23 tokens (Min:6, Max: 57).

Sets (SpartQA-Human)	FB	FR	YN	CO	All
Test	104	105	194	107	510
Train	154	149	162	151	616

And,

Sets (SpartQA-Auto)	FB	FR	YN	CO	All
Seen Test	3872	3712	3896	3594	15074
Unseen Test	3872	3721	3896	3598	15087
Dev	3842	3742	3860	3579	15023
Train	23654	23302	23968	22794	93673

Baselines

All qtypes can be cast into a sequence classification task, and the three transformer-based LMs tested in this paper, BERT , ALBERT, and XLNet, can all handle this type of tasks by classifying the representation of [CLS], a special token prepended to each target sequence. Depending on the qtype, the input sequence and how we do inference may be different.

For running each baseline At first you should install required packages. (torch, transformers (v 4.0.1)) Download the spartQA dataset from here: https://drive.google.com/file/d/1xW8abrXcX_BOkbzjrAr6UoF5KglPHQLh/view?usp=sharing Then you should create and empty dataset folder and upload dataset files in it.

After all fo this you should add the related arguments to the running command. The list of all arguments are listed below:

"--result",    "Name of the result's saving file",  type= str,  default='test'
"--result_folder",		"Name of the folder of the results file", type= str, default='SpaRT/Results'
"--model",	"Name of the model's saving file", type= str, default='test'
"--model_folder",	"Name of the folder of the models file", type=str, default = "SpaRT//Models"

"--dataset",	"name of the dataset like spartqa", type = str, default = 'spartqa'

"--no_save",	"If save the model or not", action='store_true', default = False
"--load",		"For loading model", type=str
"--cuda",		"The index of cuda", type=int, default=None 
"--qtype",	"Name of Question type. (FB, FR, CO, YN)", type=str, default = 'FB'

"--train24k",	"Train on 24k data", action='store_true', default = True
"--train100k", "Train on 100k data", action='store_true', default = False
"--train500", "Train on 500 data", action='store_true', default = False
"--unseentest", "Test on unseen data", action='store_true', default = False
"--human",	"Train and Test on human data", action='store_true', default = False
"--humantest", "Test on human data", action='store_true', default = False
"--dev_exists", 	"If development set is used", action='store_true', default = False
"--no_train", 	"Number of train samples", action='store_true', default = False


"--baseline",		"Name of the baselines. Options are 'bert', 'xlnet', 'albert'", type=str, default = 'bert'

"--pretrain",		"Name of the pretrained model. Options are 'bertqa', 'bertbc' (for bert boolean clasification). It is the same for other baselines.", type=str, default = 'bertbc'

"--con",		"Testing consistency or contrast", type=str, default = 'not'

"--optim",		"Type of optimizer. options 'sgd', 'adamw'.", type=str, default = 'sgd'

"--loss",		"Type of loss function. options 'cross'.", type=str, default = 'cross'


"--train",		"Number of train samples", type = int
"--train_log", 	"save the log of train if true", default = False, action='store_true'
"--start",	"The start number of train samples", type = int, default = 0
"--dev",		"Number of dev samples", type = int
"--dev_exist",		 "If development set is used", action='store_true'
"--test",		"Number of test samples", type = int
"--unseen",		"Number of unseen test samples", type = int


"--epochs",		"Number of epochs for training", type = int, default=0
"--lr",		"learning rate", type = float, default=4e-6

"--dropout", "If you want to set dropout=0", action='store_true', default = False
"--unfreeze", "unfreeze the first layeres of the model except this numbers", type=int, default = 0

"--other_var",  dest='other_var', action='store', help="Other variable: classification (DK, noDK), random, fine-tune on unseen. for changing model load MLM from pre-trained model and replace other parts with new on", type=str
"--detail",	"a description about the model", type = str

An example of a command is:

python main.py --qtype YN --pretrain bertqa --baseline bert --unseentest --epochs 10

Also for changing the place of results you should change the value of "result_adress", and for changing the saving models' place you should change all adresses in the torch.save parts in main.py.

Download SpartQA_Auto

Download SpartQA_Human

To cite the paper use below Bibtex:

@inproceedings{mirzaee-etal-2021-spartqa,
    title = "{SPARTQA}: A Textual Question Answering Benchmark for Spatial Reasoning",
    author = "Mirzaee, Roshanak  and
      Rajaby Faghihi, Hossein  and
      Ning, Qiang  and
      Kordjamshidi, Parisa",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.naacl-main.364",
    doi = "10.18653/v1/2021.naacl-main.364",
    pages = "4582--4598",
}

HLR/SpartQA-baselines

SpaRTQA Dataset

Dataset statistic

Baselines