lixinsu/Lion

Feature/unify_mask

Albert-Ma opened this issue · 0 comments

Now, esim and bmipm model use 1 mask value, but bert use 0 mask value.
Unify the mask mechanism which use 0 mask value, this is also consistent with tf.