Hzfinfdu/Diffusion-BERT

Lower-case in LM1B

Tomarchelone opened this issue · 1 comments

Hello!

In the paper you write

All text data are lower-cased to align with the settings of Austin et al. (2021)

But in D3PM paper it is never stated that LM1B data was lower-cased (and you can see samples from their model in the appendix where the sentences contain upper-case characters). So the perplexity comparison seems incorrect, because it is easier to model all-lowercased text. Am I missing something?

Hi, thank you for your question! I have to admit that we made a mistake on that statement. We will remove this in our later versions.

Nevertheless, we think the comparison is fair. We re-implemented D3PM with PyTorch. Besides, we replaced their backbone with the architecture of bert-base-uncased and used the same tokenizer (so that both methods are lower-cased). We obtained the baseline results based on such re-implementation.

It is also worth noting that our reported results of D3PM-absorbing are only slightly worse than that in their paper due to limitation of computational resources, indicating the correctness of our implementation. But we trained DiffusionBert for even less time.

Hope this helps! Please feel free to contact with me if you have any other questions. We will also include the cased results in the final version. :)