A benchmark dataset for evaluating dialog system and natural language generation metrics.
OtherNOASSERTION