关于词汇表以及load_model_and_parallel函数的问题

（注意：需人工将 vocab.txt 中两个 [unused] 转换成 [INV] 和 [BLANK]），这个需要一定替换吗？不替换会报错吗？
目前训练阶段没有问题，在进行crf_evaluation时：
load_model_and_parallel时会报错，尝试了多次其中错误有时候不太一样，有以下三种：
第一种：

/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 1676, in linear
    output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
01/09/2021 11:03:23 - INFO - wandb.internal.internal -   Internal process exited

第二种：

anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 1678, in linear
    output += bias
RuntimeError: CUDA error: device-side assert triggered
01/09/2021 10:21:48 - INFO - wandb.internal.internal -   Internal process exited

第三种：
context_layer = context_layer.permute(0, 2, 1, 3)acontiguous() RuntimeError: CUDA error: device-side assert triggered

我Google查了这个bug，给的比较多的答案是：label的索引有问题，因为数据集不是天池的数据集，想问下如果标注数据中没有S这个标签会导致出错吗（BMES）；另外一个答案是GPU OOM，想问下如果单卡的话会不会出现这个问题：

01/09/2021 11:03:13 - INFO - src.utils.trainer -   Saving model & optimizer & scheduler checkpoint to ./out/roberta_wwm_wd_crf/checkpoint-1005
01/09/2021 11:03:16 - INFO - src.utils.functions_utils -   Load model from ./out/roberta_wwm_wd_crf/checkpoint-603/model.pt
01/09/2021 11:03:17 - INFO - src.utils.functions_utils -   Load model from ./out/roberta_wwm_wd_crf/checkpoint-804/model.pt
01/09/2021 11:03:17 - INFO - src.utils.functions_utils -   Load model from ./out/roberta_wwm_wd_crf/checkpoint-1005/model.pt
01/09/2021 11:03:18 - INFO - src.utils.functions_utils -   Save swa model in: ./out/roberta_wwm_wd_crf/checkpoint-100000
01/09/2021 11:03:21 - INFO - src.utils.trainer -   Train done
../../bert/torch_roberta_wwm/vocab.txt
01/09/2021 11:03:21 - INFO - src.preprocess.processor -   Convert 738 examples to features
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
01/09/2021 11:03:22 - INFO - src.preprocess.processor -   Build 738 features
['0']
cuda:0
01/09/2021 11:03:22 - INFO - src.utils.functions_utils -   Load ckpt from ./out/roberta_wwm_wd_crf/checkpoint-201/model.pt
01/09/2021 11:03:23 - INFO - src.utils.functions_utils -   Use single gpu in: ['0']
Traceback (most recent call last):
  File "main.py", line 215, in <module>
    training(args)
  File "main.py", line 136, in training
    train_base(opt, train_examples, dev_examples)
  File "main.py", line 78, in train_base
    tmp_metric_str, tmp_f1 = crf_evaluation(model, dev_info, device, ent2id)
  File "/home/quincyqiang/Projects/Water-Conservancy-KG/DeepNER/src/utils/evaluator.py", line 150, in crf_evaluation
    for tmp_pred in get_base_out(model, dev_loader, device):
  File "/home/quincyqiang/Projects/Water-Conservancy-KG/DeepNER/src/utils/evaluator.py", line 22, in get_base_out

从上面的错误可以看出来，前面直接加载了checkpoint-603，checkpoint-804等，但是下面同时进行checkpoint-201评估，是不是前面加载了会导致后面内存不足？

huggingface/transformers#1805 (comment)

It's probably because your token embeddings size (vocab size) doesn't match with pre-trained model. Do model.resize_token_embeddings(len(tokenizer)) before training. Please check #1848 and #1849

难道是词汇表不一致的原因？

（注意：需人工将 vocab.txt 中两个 [unused] 转换成 [INV] 和 [BLANK]），这个需要一定替换吗？不替换会报错吗？
目前训练阶段没有问题，在进行crf_evaluation时：
load_model_and_parallel时会报错，尝试了多次其中错误有时候不太一样，有以下三种：
第一种：

/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 1676, in linear
    output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
01/09/2021 11:03:23 - INFO - wandb.internal.internal -   Internal process exited

第二种：

anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 1678, in linear
    output += bias
RuntimeError: CUDA error: device-side assert triggered
01/09/2021 10:21:48 - INFO - wandb.internal.internal -   Internal process exited

第三种：
context_layer = context_layer.permute(0, 2, 1, 3)acontiguous() RuntimeError: CUDA error: device-side assert triggered

我Google查了这个bug，给的比较多的答案是：label的索引有问题，因为数据集不是天池的数据集，想问下如果标注数据中没有S这个标签会导致出错吗（BMES）；另外一个答案是GPU OOM，想问下如果单卡的话会不会出现这个问题：

01/09/2021 11:03:13 - INFO - src.utils.trainer -   Saving model & optimizer & scheduler checkpoint to ./out/roberta_wwm_wd_crf/checkpoint-1005
01/09/2021 11:03:16 - INFO - src.utils.functions_utils -   Load model from ./out/roberta_wwm_wd_crf/checkpoint-603/model.pt
01/09/2021 11:03:17 - INFO - src.utils.functions_utils -   Load model from ./out/roberta_wwm_wd_crf/checkpoint-804/model.pt
01/09/2021 11:03:17 - INFO - src.utils.functions_utils -   Load model from ./out/roberta_wwm_wd_crf/checkpoint-1005/model.pt
01/09/2021 11:03:18 - INFO - src.utils.functions_utils -   Save swa model in: ./out/roberta_wwm_wd_crf/checkpoint-100000
01/09/2021 11:03:21 - INFO - src.utils.trainer -   Train done
../../bert/torch_roberta_wwm/vocab.txt
01/09/2021 11:03:21 - INFO - src.preprocess.processor -   Convert 738 examples to features
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
01/09/2021 11:03:22 - INFO - src.preprocess.processor -   Build 738 features
['0']
cuda:0
01/09/2021 11:03:22 - INFO - src.utils.functions_utils -   Load ckpt from ./out/roberta_wwm_wd_crf/checkpoint-201/model.pt
01/09/2021 11:03:23 - INFO - src.utils.functions_utils -   Use single gpu in: ['0']
Traceback (most recent call last):
  File "main.py", line 215, in <module>
    training(args)
  File "main.py", line 136, in training
    train_base(opt, train_examples, dev_examples)
  File "main.py", line 78, in train_base
    tmp_metric_str, tmp_f1 = crf_evaluation(model, dev_info, device, ent2id)
  File "/home/quincyqiang/Projects/Water-Conservancy-KG/DeepNER/src/utils/evaluator.py", line 150, in crf_evaluation
    for tmp_pred in get_base_out(model, dev_loader, device):
  File "/home/quincyqiang/Projects/Water-Conservancy-KG/DeepNER/src/utils/evaluator.py", line 22, in get_base_out

从上面的错误可以看出来，前面直接加载了checkpoint-603，checkpoint-804等，但是下面同时进行checkpoint-201评估，是不是前面加载了会导致后面内存不足？

可以不替换也可以正常运行，数据处理部分会将空格替换成[BLANK]，如果词汇表不进行替换，[BLANK]会被认为是[UNK]，效果略有下降。
很有可能是显存不足的原因，建议尝试单卡下小batch_size运行，确保显存充足，排除可能性。

谢谢回复。确实是GPU OOM导致的问题，测试机器为12G
~~1、max_seq_length在cut_text分句对部分句子不起作用，个人测试数据的句子最大长度为150左右，所以一开始我设置max_seq_length=120的时候，会引起如下错误~~

/opt/conda/conda-bld/pytorch_1595629411241/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [2,0,0], thread: [83,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1595629411241/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [2,0,0], thread: [84,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1595629411241/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [2,0,0], thread: [85,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1595629411241/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [2,0,0], thread: [86,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1595629411241/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [2,0,0], thread: [87,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1595629411241/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [2,0,0], thread: [88,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1595629411241/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [2,0,0], thread: [89,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1595629411241/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [2,0,0], thread: [90,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1595629411241/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [2,0,0], thread: [91,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1595629411241/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [2,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1595629411241/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [2,0,0], thread: [93,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1595629411241/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [2,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1595629411241/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [2,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

2、上面提到的RuntimeError: CUDA error: device-side assert triggered的问题是由于GPU OOM导致
训练的时候没有问题，但是在进行验证集评估的时候会引起这个错误，是因为需要加载多个模型，第一个可能会加载成功，但是记载多个的模型权重的话会引起GPU OOM
所以可以训练和验证评估分开运行,

#train(opt, model, train_dataset)

DeepNER/main.py

Line 44 in 8c4abc2

train(opt, model, train_dataset)

单独验证集评估结果：

01/09/2021 14:26:07 - INFO - src.preprocess.processor -   Build 4809 features
../../bert/torch_roberta_wwm/config.json
../../bert/torch_roberta_wwm
../../bert/torch_roberta_wwm/vocab.txt
01/09/2021 14:26:10 - INFO - src.preprocess.processor -   Convert 738 examples to features
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
01/09/2021 14:26:11 - INFO - src.preprocess.processor -   Build 738 features
['0']
cuda:0
01/09/2021 14:26:11 - INFO - src.utils.functions_utils -   Load ckpt from ./out/roberta_wwm_wd_crf/checkpoint-602/model.pt
01/09/2021 14:26:14 - INFO - src.utils.functions_utils -   Use single gpu in: ['0']
01/09/2021 14:26:45 - INFO - __main__ -   In step 602:
 [MIRCO] precision: 0.8084, recall: 0.8157, f1: 0.8101
['0']
cuda:0
01/09/2021 14:26:45 - INFO - src.utils.functions_utils -   Load ckpt from ./out/roberta_wwm_wd_crf/checkpoint-1204/model.pt
01/09/2021 14:26:45 - INFO - src.utils.functions_utils -   Use single gpu in: ['0']
01/09/2021 14:27:14 - INFO - __main__ -   In step 1204:
 [MIRCO] precision: 0.8048, recall: 0.8324, f1: 0.8170
01/09/2021 14:27:14 - INFO - __main__ -   Max f1 is: 0.8170151702997139, in step 1204
01/09/2021 14:27:14 - INFO - __main__ -   ./out/roberta_wwm_wd_crf/checkpoint-602已删除
01/09/2021 14:27:14 - INFO - root -   ----------本次容器运行时长：0:01:18-----------

3、另外在evaluator.py中role_metric = np.zeros([13, 3])，13为entity_types的种类个数，这里可以通过参数进行设定，num_labels或者len(ENTITY_TYPES)传参