Issues
- 1
- 0
继续 Pretraining 的问题
#22 opened by Jhangsy - 1
你好 作者,论文里面提出来一种新的注意力机制,能否给下代码,写的太抽象了,无法理解
#21 opened by zhaolulul - 1
预训练时的max_Seq_length
#20 opened by gsxf997 - 0
MLM能否正常inference
#18 opened by YuxianMeng - 0
CLUE发布的roberta模型,预训练时是否使用了wwm呢?
#17 opened by waywaywayw - 0
roberta_tiny_clue在IFLYTEK’的训练参数
#16 opened by selephantjy - 1
关于RoBERTa pair的预训数据构造形式
#14 opened by songt96 - 0
XLnet
#15 opened by onlyonewater - 2
bert_config.json 里的参数问题
#12 opened by Jhangsy - 2
- 0
能否考虑将预训练模型放到讯飞云?
#10 opened by Fan9 - 2
大佬好,请问是否有发布base版BERT/RoBERTa的计划呢?
#9 opened by justzhanghong - 8
句子对任务的RoBERTa-tiny-pair的ckpt文件的问题
#4 opened by drzqb - 2
RoBERTa-tiny-clue 和RoBERTa-tiny-pair的异同
#2 opened by chros425 - 1
新版transformers中已经没有WarmupLinearSchedule了
#3 opened by DrDavidS - 3
如果用大模型(比如roberta_large)的话,推荐使用多大的学习率
#1 opened by zzy99