/RACo

Resources for Retrieval Augmentation for Commonsense Reasoning: A Unified Approach. EMNLP 2022.

Code for Retrieval Augmentation for Commonsense Reasoning

Introduction of RACo

  • This is the official resources of our EMNLP 2022 paper "Retrieval Augmentation for Commonsense Reasoning: A Unified Approach" [arXiv].

Step0 Download the Commonsense Corpus

  • Corpus (20M): Google drive [link]

  • Code: Official DPR code [link]

    • first run python merge-corpus.py to construct corpus
    • modify the retrieval corpus path in above the DPR code

Step1 Training: Commonsense Retriever

  • Training Data: Google drive [link]

  • Code: Official DPR code, same as above.

    raco_train:
        _target_: dpr.data.biencoder_data.JsonQADataset
        file: {your folder path}/train.json
    
    raco_dev:
        _target_: dpr.data.biencoder_data.JsonQADataset
        file: {your folder path}/dev.json
    

Step1 Inference: Retrieve Documents

  • Inference Data: Google drive [link]

  • Code: Official DPR code, same as above.

    {dataset}_train:
        _target_: dpr.data.retriever_data.CsvQASrc
        file: {your folder path}/{dataset}/train.tsv
    
    {dataset}_dev:
        _target_: dpr.data.retriever_data.CsvQASrc
        file: {your folder path}/{dataset}/dev.tsv
    
    {dataset}_test:
        _target_: dpr.data.retriever_data.CsvQASrc
        file: {your folder path}/{dataset}/test.tsv
    

Step2 Training and Inference: Commonsense Reader

  • Training Data: obtained from the last step

  • Code: Official FiD code [link]

Step2: FiD Outputs Evaluation

  • Accuracy is the same as exact match in FiD code.

  • BLUE, ROUGE is from the CommonGen GitHub repo.

    • Some commonly seen issues when installing the lib [link]

Citation

@inproceedings{yu2022retrieval,
  title={Retrieval Augmentation for Commonsense Reasoning: A Unified Approach},
  author={Yu, Wenhao and Zhu, Chenguang and Zhang, Zhihan and Wang, Shuohang and Zhang, Zhuosheng and Fang, Yuwei and Jiang, Meng},
  booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2022}
}

Please kindly cite our paper if you find this paper and the codes helpful.