skywalker023/fantom

👻 Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"

PythonMIT

Issues

Some factQA questions have identical correct_answer and wrong_answer
#3 opened 7 months ago by dniku
7
Dataset size does not match what is reported in the paper
#4 opened 6 months ago by dniku
1
Did you try few-shot prompting GPT-4?
#2 opened 8 months ago by lukasberglund
1
Dataset/code release?
#1 opened a year ago by Jiayi-Pan
1