TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models

This is the repository containing evaluation datas, instructions and demonstrations with ACL 2024 paper TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models (Chu et al., 2023)

Datasets

Models

GPT-4 (OpenAI, 2023)
GPT-3.5 (OpenAI, 2022)
LLaMA2 (Touvron et al., 2023)
Baichuan2 (Yang et al., 2023)
Vicuna-1.5 (Chiang et al., 2023)
Mistral (Jiang et al., 2023)
ChatGLM3 (Zeng et al., 2023)
FLAN-T5 (Chung et al., 2022)

Performance

Citation

If you find our work helpful, you can cite this paper as:

@misc{chu2023timebench,
      title={TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models}, 
      author={Zheng Chu and Jingchang Chen and Qianglong Chen and Weijiang Yu and Haotian Wang and Ming Liu and Bing Qin},
      year={2023},
      eprint={2311.17667},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2311.17667}
}

zchuz/TimeBench

TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models

Datasets

Symbolic Temporal Reasoning

Commonsense Temporal Reasoning

Event Temporal Reasoning

Models

Performance

Citation