ariecattan/cross_encoder

whether this model can be trained?

cccccs opened this issue · 1 comments

excuse me,I run this model fluently but the loss can‘t decrease,and the num of predicted positive label in dev dataset is zero. could you give me some suggestions? thanks.
`2021-12-03 09:40:57,020 - INFO - gpu_num = [
0
]
roberta_model = "roberta-large"
bert_hidden_size = 1024
hidden_layer = 1024
dropout = 0.3
with_mention_width = true
with_head_attention = true
embedding_dimension = 20
max_mention_span = 10
use_gold_mentions = true
mention_type = "events"
top_k = 0.25
training_method = "continue"
subtopic = true
use_predicted_topics = false
segment = true
random_seed = 0
epochs = 20
batch_size = 32
learning_rate = 0.0001
weight_decay = 0
loss = "bce"
optimizer = "adam"
adam_epsilon = 1e-08
segment_window = 512
neg_samp = true
exact = false
log_path = "logs/pairwise_scorer/"
data_folder = "data/ecb/mentions"
span_repr_path = "models/span_scorers/events_span_repr_0"
span_scorer_path = "models/span_scorers/events_span_scorer_0"
model_path = "models/pairwise_scorers"
2021-12-03 09:41:03,131 - INFO - Init models
INFO:transformers.tokenization_utils_base:loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-vocab.json from cache at /home/changsz/.cache/torch/transformers/1ae1f5b6e2b22b25ccc04c000bb79ca847aa226d0761536b011cf7e5868f0655.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
INFO:transformers.tokenization_utils_base:loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-merges.txt from cache at /home/changsz/.cache/torch/transformers/f8f83199a6270d582d6245dc100e99c4155de81c9745c6248077018fe01abcfb.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
INFO:transformers.tokenization_utils:Adding [START] to the vocabulary
INFO:transformers.tokenization_utils:Adding [END] to the vocabulary
INFO:transformers.configuration_utils:loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-large-config.json from cache at /home/changsz/.cache/torch/transformers/c22e0b5bbb7c0cb93a87a2ae01263ae715b4c18d692b1740ce72cacaa99ad184.2d28da311092e99a05f9ee17520204614d60b0bfdb32f8a75644df7737b6a748
INFO:transformers.configuration_utils:Model config RobertaConfig {
"architectures": [
"RobertaForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"eos_token_id": 2,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"initializer_range": 0.02,
"intermediate_size": 4096,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 514,
"model_type": "roberta",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"pad_token_id": 1,
"type_vocab_size": 1,
"vocab_size": 50265
}

INFO:transformers.modeling_utils:loading weights file https://cdn.huggingface.co/roberta-large-pytorch_model.bin from cache at /home/changsz/.cache/torch/transformers/2339ac1858323405dffff5156947669fed6f63a0c34cfab35bda4f78791893d2.fc7abf72755ecc4a75d0d336a93c1c63358d2334f5998ed326f3b0da380bf536
INFO:transformers.modeling_utils:All model checkpoint weights were used when initializing RobertaModel.

INFO:transformers.modeling_utils:All the weights of RobertaModel were initialized from the model checkpoint at roberta-large.
If your task is similar to the task the model of the ckeckpoint was trained on, you can already use RobertaModel for predictions without further training.
2021-12-03 09:41:17,047 - INFO - Number of parameters of mention extractor: 4218982
2021-12-03 09:41:17,048 - INFO - Number of parameters of the pairwise classifier: 355493121
2021-12-03 09:41:17,048 - INFO - Number of topics: 25
2021-12-03 09:41:17,048 - INFO - Epoch: 0
100%|██████████| 10679/10679 [1:15:20<00:00, 2.36it/s]
2021-12-03 10:56:37,131 - INFO - Number of positive/total pairs: 15211/341706
2021-12-03 10:56:37,132 - INFO - Accumulate loss: 3346.0918550994247
2021-12-03 10:56:37,132 - INFO - Evaluate on the dev set
100%|██████████| 3150/3150 [04:41<00:00, 11.18it/s]
2021-12-03 11:01:19,154 - INFO - Number of predictions: 0/100784
2021-12-03 11:01:19,156 - INFO - Number of positive pairs: 5881/100784
2021-12-03 11:01:19,156 - INFO - Min score: -7.677737712860107
2021-12-03 11:01:19,156 - INFO - Max score: -1.2886110544204712
2021-12-03 11:01:19,156 - INFO - Strict - Recall: 0.0, Precision: 0, F1: 0, Accuracy: 0.9416474837275758
INFO:transformers.configuration_utils:Configuration saved in models/pairwise_scorers/large_32/checkpoint_0/bert/config.json
INFO:transformers.modeling_utils:Model weights saved in models/pairwise_scorers/large_32/checkpoint_0/bert/pytorch_model.bin
2021-12-03 11:01:20,781 - INFO - Epoch: 1
15%|█▍ | 1553/10679
`

answer in another issue