jwkirchenbauer/lm-watermarking

Confused about a place in the paper

Closed this issue · 2 comments

In section 6. Experiment, there is the following paragraph: "Watermark Strength vs Number of Tokens. Theory predicts that the type I and type II error rates of the watermark should decay to zero as the sequence length T increases." I am confused about how to theoretically prove that as T increases, both type I and type II errors approach 0. I am eagerly anticipating response.

Hi, and thanks for your interest in the work! I saw your question on twitter but we can have this discussion here to avoid the character limits etc.

To show that Type I errors, False Positives, approach 0 as T gets large, we need to argue that the likelihood tends to 1 that unwatermarked samples (i.e. human written text) will yield z-scores corresponding to the null hypothesis, and that are less than the z-score chosen as the rejection threshold. We rely on the assumption that the PRNG used to generate the green/red partition is unbiased, and with sufficient samples, the ratio of green tokens in any piece of text will approach gamma. It's worth noting that a correction for multiple tests must be applied, or the --no-repeat-ngrams option must be used to make sure that the PRNG values and therefore green/red partitions are properly independent for this asymptotic behavior to be observed empirically.

For Type 2 errors, False Negatives, to approach 0 at large T, we need to argue that the likelihood that watermarked samples yield z-scores that are above our rejection threshold, also approaches 1. We rely on the results of Thm 4.2 this time and assume that there is sufficient entropy in the text to ensure that the expected number of green tokens in a watermarked sequence is above the rejection threshold. Looking at the expectation $\mathbb{E}|s|_G$ in Thm4.2, if delta is greater than 0, then alpha is > 1, and since gamma is between 0 and 1, then the whole coefficient that T and S* are multiplied by is greater than 0. So there exists some T large enough to make the expected number of green tokens is sufficiently large to reject the null hypothesis. (we can make a similar argument about the expected ratio $\frac{|s|_G}{T}$ but it's not the quantity that the expectation is on in the paper)

However, we also would want to confirm that the variance approaches 0, so that we are confident in the outcome of the expectation, and for this it is better to consider the ratio of green tokens. Multiplying the expression we provided for $Var |s|_G$ by $\frac{1}{T^2}$ on both sides, we see that for large T, the bound on $Var \frac{|s|_G}{T}$ tightens towards 0.

So, overall, this is of course only a probabilistic argument about the asymptotics, and not a guarantee of a zero error rates. If more formalism is needed, maybe we can circle up. I hope this helps!

Hi! You're amazing! Thank you so much for your thoughtful response. I apologize for the delay in replying lately due to other commitments, but this won't happen again in the future. I find your reasoning and mathematical assumptions to be very rigorous. I have some additional questions that I'd like to discuss with you:

  1. For the Type I error, the mathematical assumptions you made earlier are essential. However, I've been thinking that for a text without a watermark, it can be perfectly modeled by the binomial distribution, which in turn approximates the normal distribution. From this perspective, the Type I error seems to be related only to the threshold z-score set and has nothing to do with the text length T. But if you look at it from another angle, if the average z-score of the watermarked text increases with T, then we can set the threshold very high, making the Type I error approach 0.

  2. For the Type II error, your discussion has given me some insights.

  3. Additionally, I'd like to inquire about considerations in low entropy situations, as it's evident that it greatly aids in improving the detection rate and the quality of the text. From your experiments, it can be seen that the Type II error is more likely to occur in low entropy situations. However, I still haven't thoroughly grasped the mathematical justification. While Thm4.2 is very useful in proving the lower bound, it can't prove the upper bound. Do you have any mathematical or theoretical insights or intuitions about this?

  4. Have you ever tried to mathematically derive the distribution type of $|s|_G$ after adding δ (like Gaussian distribution), or experimentally considered it to conform to a certain distribution with a certain confidence level?

Your work is truly impressive!I look forward to our discussion.