请问大神,第五步Train the whole model with RL时,报出RuntimeError: result type Byte can't be cast to the desired output type Bool,是什么原因呢,能帮忙看下嘛
Closed this issue · 1 comments
大神,能麻烦看下这个问题嘛,可能有点小儿科,谢谢啦~~~
[2022-01-06 09:48:03,967 INFO] Namespace(accum_count=2, agent=True, alpha=0.6, batch_ex_size=4, batch_size=2000, beam_size=3, bert_dir='bert/chinese_bert', beta1=0.9, beta2=0.999, block_trigram=True, copy_attn=False, copy_attn_force=False, copy_loss_by_seqlength=False, coverage=False, cust=True, data_path='bert_data/ali', dec_dropout=0.2, dec_ff_size=2048, dec_heads=8, dec_hidden_size=768, dec_layers=3, decoder='transformer', enc_dropout=0.2, enc_ff_size=2048, enc_heads=8, enc_hidden_size=768, enc_layers=3, encoder='bert', ex_max_token_num=500, finetune_bert=True, freeze_step=500, generator_shard_size=32, gpu_ranks=[0], hier_dropout=0.2, hier_ff_size=2048, hier_heads=8, hier_hidden_size=768, hier_layers=2, idf_info_path='bert_data/idf_info.pt', label_smoothing=0.1, log_file='logs/rl.topic.train.log', loss_lambda=0.001, lr=1e-05, lr_bert=0.001, lr_other=0.01, lr_topic=0.0001, mask_token_prob=0.15, max_grad_norm=0, max_length=100, max_pos=512, max_tgt_len=100, max_word_count=6000, min_length=10, min_word_count=5, mode='train', model_path='models/rl_topic', noise_rate=0.5, optim='adam', pn_dropout=0.2, pn_ff_size=2048, pn_heads=8, pn_hidden_size=768, pn_layers=2, pretrain=False, pretrain_steps=80000, report_every=5, result_path='results/ali', save_checkpoint_steps=500, seed=666, select_sent_prob=0.9, sent_dropout=0.2, sent_ff_size=2048, sent_heads=8, sent_hidden_size=768, sent_layers=3, sep_optim=False, share_emb=True, split_noise=True, src_data_mode='utt', test_all=False, test_batch_ex_size=50, test_batch_size=20000, test_from='', test_mode='abs', test_start_from=-1, tokenize=True, topic_model=True, topic_num=50, train_from='models/pipeline_topic/model_step_80000.pt', train_from_ignore_optim=True, train_steps=30000, use_idf=False, visible_gpus='0', warmup=True, warmup_steps=5000, warmup_steps_bert=5000, warmup_steps_other=5000, word_emb_mode='word2vec', word_emb_path='pretrain_emb/word2vec', word_emb_size=100, world_size=1)
[2022-01-06 09:48:03,967 INFO] Device ID 0
[2022-01-06 09:48:03,967 INFO] Device cuda
[2022-01-06 09:48:04,022 INFO] Loading checkpoint from models/pipeline_topic/model_step_80000.pt
[2022-01-06 09:48:05,538 INFO] Model name 'bert/chinese_bert' not found in model shortcut name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc). Assuming 'bert/chinese_bert' is a path or url to a directory containing tokenizer files.
[2022-01-06 09:48:05,538 INFO] Didn't find file bert/chinese_bert/added_tokens.json. We won't load it.
[2022-01-06 09:48:05,538 INFO] Didn't find file bert/chinese_bert/special_tokens_map.json. We won't load it.
[2022-01-06 09:48:05,538 INFO] loading file bert/chinese_bert/vocab.txt
[2022-01-06 09:48:05,538 INFO] loading file None
[2022-01-06 09:48:05,538 INFO] loading file None
[2022-01-06 09:48:05,685 INFO] loading configuration file bert/chinese_bert/config.json
[2022-01-06 09:48:05,685 INFO] Model config {
"attention_probs_dropout_prob": 0.1,
"directionality": "bidi",
"finetuning_task": null,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"num_labels": 2,
"output_attentions": false,
"output_hidden_states": false,
"pooler_fc_size": 768,
"pooler_num_attention_heads": 12,
"pooler_num_fc_layers": 3,
"pooler_size_per_head": 128,
"pooler_type": "first_token_transform",
"torchscript": false,
"type_vocab_size": 2,
"vocab_size": 21128
}
[2022-01-06 09:48:05,686 INFO] loading weights file bert/chinese_bert/pytorch_model.bin
[2022-01-06 09:48:09,071 INFO] loading Word2VecKeyedVectors object from pretrain_emb/word2vec
[2022-01-06 09:48:09,103 INFO] setting ignored attribute vectors_norm to None
[2022-01-06 09:48:09,103 INFO] loaded pretrain_emb/word2vec
[2022-01-06 09:48:11,252 INFO] Model(
(embeddings): Embedding(21128, 768, padding_idx=0)
(encoder): Bert(
(model): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(21128, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(1): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(2): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(3): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(4): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(5): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(6): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(7): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(8): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(9): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(10): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(11): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): BertLayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): BertPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)
)
(sent_encoder): TransformerEncoder(
(pos_emb): PositionalEncoding(
(dropout): Dropout(p=0.2, inplace=False)
)
(transformer): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_values): Linear(in_features=768, out_features=768, bias=True)
(linear_query): Linear(in_features=768, out_features=768, bias=True)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.2, inplace=False)
(final_linear): Linear(in_features=768, out_features=768, bias=True)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=768, out_features=2048, bias=True)
(w_2): Linear(in_features=2048, out_features=768, bias=True)
(layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.2, inplace=False)
(dropout_2): Dropout(p=0.2, inplace=False)
)
(layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.2, inplace=False)
)
(1): TransformerEncoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_values): Linear(in_features=768, out_features=768, bias=True)
(linear_query): Linear(in_features=768, out_features=768, bias=True)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.2, inplace=False)
(final_linear): Linear(in_features=768, out_features=768, bias=True)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=768, out_features=2048, bias=True)
(w_2): Linear(in_features=2048, out_features=768, bias=True)
(layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.2, inplace=False)
(dropout_2): Dropout(p=0.2, inplace=False)
)
(layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.2, inplace=False)
)
(2): TransformerEncoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_values): Linear(in_features=768, out_features=768, bias=True)
(linear_query): Linear(in_features=768, out_features=768, bias=True)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.2, inplace=False)
(final_linear): Linear(in_features=768, out_features=768, bias=True)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=768, out_features=2048, bias=True)
(w_2): Linear(in_features=2048, out_features=768, bias=True)
(layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.2, inplace=False)
(dropout_2): Dropout(p=0.2, inplace=False)
)
(layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.2, inplace=False)
)
)
(layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
)
(hier_encoder): TransformerEncoder(
(pos_emb): PositionalEncoding(
(dropout): Dropout(p=0.2, inplace=False)
)
(transformer): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_values): Linear(in_features=768, out_features=768, bias=True)
(linear_query): Linear(in_features=768, out_features=768, bias=True)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.2, inplace=False)
(final_linear): Linear(in_features=768, out_features=768, bias=True)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=768, out_features=2048, bias=True)
(w_2): Linear(in_features=2048, out_features=768, bias=True)
(layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.2, inplace=False)
(dropout_2): Dropout(p=0.2, inplace=False)
)
(layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.2, inplace=False)
)
(1): TransformerEncoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_values): Linear(in_features=768, out_features=768, bias=True)
(linear_query): Linear(in_features=768, out_features=768, bias=True)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.2, inplace=False)
(final_linear): Linear(in_features=768, out_features=768, bias=True)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=768, out_features=2048, bias=True)
(w_2): Linear(in_features=2048, out_features=768, bias=True)
(layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.2, inplace=False)
(dropout_2): Dropout(p=0.2, inplace=False)
)
(layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.2, inplace=False)
)
)
(layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
)
(pn_decoder): TransformerDecoder(
(transformer_layers): ModuleList(
(0): TransformerDecoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_values): Linear(in_features=768, out_features=768, bias=True)
(linear_query): Linear(in_features=768, out_features=768, bias=True)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.2, inplace=False)
(final_linear): Linear(in_features=768, out_features=768, bias=True)
)
(context_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_values): Linear(in_features=768, out_features=768, bias=True)
(linear_query): Linear(in_features=768, out_features=768, bias=True)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.2, inplace=False)
(final_linear): Linear(in_features=768, out_features=768, bias=True)
(linear_topic_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_topic_vecs): Linear(in_features=300, out_features=768, bias=True)
(linear_topic_w): Linear(in_features=2304, out_features=8, bias=True)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=768, out_features=2048, bias=True)
(w_2): Linear(in_features=2048, out_features=768, bias=True)
(layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.2, inplace=False)
(dropout_2): Dropout(p=0.2, inplace=False)
)
(layer_norm_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(layer_norm_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(drop): Dropout(p=0.2, inplace=False)
)
(1): TransformerDecoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_values): Linear(in_features=768, out_features=768, bias=True)
(linear_query): Linear(in_features=768, out_features=768, bias=True)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.2, inplace=False)
(final_linear): Linear(in_features=768, out_features=768, bias=True)
)
(context_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_values): Linear(in_features=768, out_features=768, bias=True)
(linear_query): Linear(in_features=768, out_features=768, bias=True)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.2, inplace=False)
(final_linear): Linear(in_features=768, out_features=768, bias=True)
(linear_topic_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_topic_vecs): Linear(in_features=300, out_features=768, bias=True)
(linear_topic_w): Linear(in_features=2304, out_features=8, bias=True)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=768, out_features=2048, bias=True)
(w_2): Linear(in_features=2048, out_features=768, bias=True)
(layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.2, inplace=False)
(dropout_2): Dropout(p=0.2, inplace=False)
)
(layer_norm_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(layer_norm_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(drop): Dropout(p=0.2, inplace=False)
)
)
(layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
)
(pn_generator): PointerNetGenerator(
(linear_dec): Linear(in_features=768, out_features=768, bias=True)
(linear_mem): Linear(in_features=768, out_features=768, bias=True)
(score_linear): Linear(in_features=768, out_features=1, bias=True)
(tanh): Tanh()
(softmax): LogSoftmax(dim=-1)
)
(decoder): TransformerDecoder(
(embeddings): Embedding(21128, 768, padding_idx=0)
(pos_emb): PositionalEncoding(
(dropout): Dropout(p=0.2, inplace=False)
)
(transformer_layers): ModuleList(
(0): TransformerDecoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_values): Linear(in_features=768, out_features=768, bias=True)
(linear_query): Linear(in_features=768, out_features=768, bias=True)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.2, inplace=False)
(final_linear): Linear(in_features=768, out_features=768, bias=True)
)
(context_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_values): Linear(in_features=768, out_features=768, bias=True)
(linear_query): Linear(in_features=768, out_features=768, bias=True)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.2, inplace=False)
(final_linear): Linear(in_features=768, out_features=768, bias=True)
(linear_topic_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_topic_vecs): Linear(in_features=300, out_features=768, bias=True)
(linear_topic_w): Linear(in_features=2304, out_features=8, bias=True)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=768, out_features=2048, bias=True)
(w_2): Linear(in_features=2048, out_features=768, bias=True)
(layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.2, inplace=False)
(dropout_2): Dropout(p=0.2, inplace=False)
)
(layer_norm_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(layer_norm_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(drop): Dropout(p=0.2, inplace=False)
)
(1): TransformerDecoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_values): Linear(in_features=768, out_features=768, bias=True)
(linear_query): Linear(in_features=768, out_features=768, bias=True)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.2, inplace=False)
(final_linear): Linear(in_features=768, out_features=768, bias=True)
)
(context_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_values): Linear(in_features=768, out_features=768, bias=True)
(linear_query): Linear(in_features=768, out_features=768, bias=True)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.2, inplace=False)
(final_linear): Linear(in_features=768, out_features=768, bias=True)
(linear_topic_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_topic_vecs): Linear(in_features=300, out_features=768, bias=True)
(linear_topic_w): Linear(in_features=2304, out_features=8, bias=True)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=768, out_features=2048, bias=True)
(w_2): Linear(in_features=2048, out_features=768, bias=True)
(layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.2, inplace=False)
(dropout_2): Dropout(p=0.2, inplace=False)
)
(layer_norm_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(layer_norm_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(drop): Dropout(p=0.2, inplace=False)
)
(2): TransformerDecoderLayer(
(self_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_values): Linear(in_features=768, out_features=768, bias=True)
(linear_query): Linear(in_features=768, out_features=768, bias=True)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.2, inplace=False)
(final_linear): Linear(in_features=768, out_features=768, bias=True)
)
(context_attn): MultiHeadedAttention(
(linear_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_values): Linear(in_features=768, out_features=768, bias=True)
(linear_query): Linear(in_features=768, out_features=768, bias=True)
(softmax): Softmax(dim=-1)
(dropout): Dropout(p=0.2, inplace=False)
(final_linear): Linear(in_features=768, out_features=768, bias=True)
(linear_topic_keys): Linear(in_features=768, out_features=768, bias=True)
(linear_topic_vecs): Linear(in_features=300, out_features=768, bias=True)
(linear_topic_w): Linear(in_features=2304, out_features=8, bias=True)
)
(feed_forward): PositionwiseFeedForward(
(w_1): Linear(in_features=768, out_features=2048, bias=True)
(w_2): Linear(in_features=2048, out_features=768, bias=True)
(layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(dropout_1): Dropout(p=0.2, inplace=False)
(dropout_2): Dropout(p=0.2, inplace=False)
)
(layer_norm_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(layer_norm_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
(drop): Dropout(p=0.2, inplace=False)
)
)
(layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
)
(generator): Generator(
(linear): Linear(in_features=768, out_features=21128, bias=True)
(softmax): LogSoftmax(dim=-1)
)
(topic_model): MultiTopicModel(
(tm1): TopicModel(
(mlp): Sequential(
(0): Linear(in_features=7668, out_features=200, bias=True)
(1): Tanh()
)
(mu_linear): Linear(in_features=200, out_features=100, bias=True)
(sigma_linear): Linear(in_features=200, out_features=100, bias=True)
(theta_linear): Linear(in_features=100, out_features=50, bias=True)
)
(tm2): TopicModel(
(mlp): Sequential(
(0): Linear(in_features=7668, out_features=200, bias=True)
(1): Tanh()
)
(mu_linear): Linear(in_features=200, out_features=100, bias=True)
(sigma_linear): Linear(in_features=200, out_features=100, bias=True)
(theta_linear): Linear(in_features=100, out_features=50, bias=True)
)
(tm3): TopicModel(
(mlp): Sequential(
(0): Linear(in_features=7668, out_features=200, bias=True)
(1): Tanh()
)
(mu_linear): Linear(in_features=200, out_features=100, bias=True)
(sigma_linear): Linear(in_features=200, out_features=100, bias=True)
(theta_linear): Linear(in_features=100, out_features=50, bias=True)
)
)
(topic_gate_linear_summ): Linear(in_features=1068, out_features=300, bias=True)
(topic_emb_linear_summ): Linear(in_features=768, out_features=300, bias=True)
(topic_gate_linear_noise): Linear(in_features=1068, out_features=300, bias=True)
(topic_emb_linear_noise): Linear(in_features=768, out_features=300, bias=True)
)
gpu_rank 0
[2022-01-06 09:48:11,262 INFO] * number of parameters: 197486823
[2022-01-06 09:48:11,262 INFO] Start training...
[2022-01-06 09:48:15,105 INFO] Loading train dataset from bert_data/ali.train.8.bert.pt, number of examples: 1181
dup_mask <class 'torch.Tensor'>
F <class 'torch.Tensor'>
Traceback (most recent call last):
File "./src/train.py", line 163, in
train(args, device_id)
File "/home/hp/shuang/topic-dialog-summ-main/src/train_abstractive.py", line 359, in train
train_single(args, device_id)
File "/home/hp/shuang/topic-dialog-summ-main/src/train_abstractive.py", line 425, in train_single
trainer.train(train_iter_fct, args.train_steps)
File "/home/hp/shuang/topic-dialog-summ-main/src/models/rl_model_trainer.py", line 172, in train
report_stats, step)
File "/home/hp/shuang/topic-dialog-summ-main/src/models/rl_model_trainer.py", line 203, in _gradient_calculation
rl_loss, decode_output, topic_loss, _, _ = self.model(batch)
File "/home/hp/anaconda3/envs/shuang/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hp/shuang/topic-dialog-summ-main/src/models/rl_model.py", line 878, in forward
method="sample" if self.training else "max")
File "/home/hp/shuang/topic-dialog-summ-main/src/models/rl_model.py", line 379, in _pointer_net_decoding
dup_mask += F.one_hot(ids, dist_size).byte()
RuntimeError: result type Byte can't be cast to the desired output type Bool
应该是torch版本问题,可以自行修改代码,保证 dup_mask 的类型和 F.one_hot() 返回的类型一致即可。