These are the human judgments we collected when evaluating ParaBank.

Hu, J. E., R. Rudinger, M. Post, & B. Van Durme. 2019. ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-constrained Neural Machine Translation. Proceedings of AAAI 2019, Honolulu, Hawaii, January 26 – Feb 1, 2019.

A description of the fields: worker_id_anon: anonymized worker ID; ref_id: reference ID; ref_len: reference length {5, 10, 20, 40}; sys_id: ParaBank system(s) that produce(s) the paraphrase, except for system 35, which is ParaNMT, and system 0, which copies the reference; candidate: paraphrastic candidate produced by the ParaBank system(s); NLL: negative log likelihood produced of the candidate by the NMT model; human_score: human judgment score, ranging from 0 to 100; gm: grammaticality/fluency judged by annotators (0 - nonsensical, 1 - meaningful but ungrammatical, 2 - meaningful and grammatical).