Albert-Ma opened this issue 5 years ago · 0 comments
Now, esim and bmipm model use 1 mask value, but bert use 0 mask value. Unify the mask mechanism which use 0 mask value, this is also consistent with tf.