last_hidden_states being string instead of Tensor

21:40:47 INFO diskdict(20):__init__|: loaded DiskDict with 6726 items from knowledge/bingliuopinion/opinion_polarity.ddict
21:40:47 INFO diskdict(20):__init__|: loaded DiskDict with 6886 items from knowledge/mpqasubjectivity/subjclueslen1-HLTEMNLP05.tff.ddict
21:40:47 INFO diskdict(20):__init__|: loaded DiskDict with 6468 items from knowledge/nrcemolex/NRC-Emotion-Lexicon-Wordlevel-v0.92.txt.ddict
21:40:49 INFO infer(25):__init__|: overwriting: own_model_name=None to grutsc
21:40:49 INFO infer(25):__init__|: overwriting: default_lm=bert-base-uncased to roberta-base
21:40:49 INFO infer(25):__init__|: overwriting: state_dict=None to grutsc
21:40:49 INFO infer(25):__init__|: overwriting: knowledgesources=[] to nrc_emotions mpqa_subjectivity bingliu_opinion
21:40:49 INFO train(1045):prepare_and_start_instructor|: set default language model to roberta-base
21:40:49 INFO train(1038):post_process_arguments|: updated total number of categories to 10 with EKS nrc_emotions
21:40:49 INFO train(1038):post_process_arguments|: updated total number of categories to 13 with EKS mpqa_subjectivity
21:40:49 INFO train(1038):post_process_arguments|: updated total number of categories to 15 with EKS bingliu_opinion
21:40:49 INFO train(1064):prepare_and_start_instructor|: set number of polarity classes to 3
21:40:49 INFO train(1071):prepare_and_start_instructor|: no random seed was given, using system time
21:40:49 INFO train(1072):prepare_and_start_instructor|: setting random seed: 1621885249
21:40:49 INFO train(911):_setup_cuda|: cuda information
21:40:49 INFO train(912):_setup_cuda|: scc SGE_GPU: None
21:40:49 INFO train(913):_setup_cuda|: arg: cuda device: None
21:40:49 INFO train(936):_setup_cuda|: using CPU
21:40:49 INFO train(223):create_transformer_model|: creating model for weights name: roberta-base
21:40:49 INFO train(239):create_transformer_model|: using model_path: roberta-base
Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.bias', 'lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
21:40:55 INFO train(121):__init__|: initialized transformer tokenizers and models
21:40:55 INFO train(148):__init__|: loading weights from pretrained_models/state_dicts/grutsc...
21:40:57 INFO train(151):__init__|: done
21:40:57 INFO train(153):__init__|: initialized own model
21:40:57 INFO train(212):_print_args|: n_trainable_params: 153015555, n_nontrainable_params: 0
21:40:57 INFO train(215):_print_args|: > training arguments:
21:40:57 INFO train(217):_print_args|: >>> training_mode: False
21:40:57 INFO train(217):_print_args|: >>> own_model_name: grutsc
21:40:57 INFO train(217):_print_args|: >>> dataset_name: None
21:40:57 INFO train(217):_print_args|: >>> data_format: None
21:40:57 INFO train(217):_print_args|: >>> optimizer: adam
21:40:57 INFO train(217):_print_args|: >>> initializer: xavier_uniform_
21:40:57 INFO train(217):_print_args|: >>> learning_rate: 2e-05
21:40:57 INFO train(217):_print_args|: >>> dropout: 0.1
21:40:57 INFO train(217):_print_args|: >>> l2reg: 0.01
21:40:57 INFO train(217):_print_args|: >>> num_epoch: 10
21:40:57 INFO train(217):_print_args|: >>> batch_size: 64
21:40:57 INFO train(217):_print_args|: >>> log_step: 5
21:40:57 INFO train(217):_print_args|: >>> max_seq_len: 150
21:40:57 INFO train(217):_print_args|: >>> polarities_dim: 3
21:40:57 INFO train(217):_print_args|: >>> device: cpu
21:40:57 INFO train(217):_print_args|: >>> seed: 1621885249
21:40:57 INFO train(217):_print_args|: >>> local_context_focus: cdm
21:40:57 INFO train(217):_print_args|: >>> SRD: 3
21:40:57 INFO train(217):_print_args|: >>> snem: f1_macro
21:40:57 INFO train(217):_print_args|: >>> devmode: False
21:40:57 INFO train(217):_print_args|: >>> experiment_path: ./
21:40:57 INFO train(217):_print_args|: >>> balancing: None
21:40:57 INFO train(217):_print_args|: >>> spc_lm_representation: mean_last
21:40:57 INFO train(217):_print_args|: >>> spc_input_order: text_target
21:40:57 INFO train(217):_print_args|: >>> use_early_stopping: False
21:40:57 INFO train(217):_print_args|: >>> eval_only_after_last_epoch: False
21:40:57 INFO train(217):_print_args|: >>> pretrained_model_name: None
21:40:57 INFO train(217):_print_args|: >>> state_dict: pretrained_models/state_dicts/grutsc
21:40:57 INFO train(217):_print_args|: >>> single_targets: True
21:40:57 INFO train(217):_print_args|: >>> multi_targets: False
21:40:57 INFO train(217):_print_args|: >>> loss: crossentropy
21:40:57 INFO train(217):_print_args|: >>> targetclasses: newsmtsc3
21:40:57 INFO train(217):_print_args|: >>> knowledgesources: ('nrc_emotions', 'mpqa_subjectivity', 'bingliu_opinion')
21:40:57 INFO train(217):_print_args|: >>> is_use_natural_target_phrase_for_spc: False
21:40:57 INFO train(217):_print_args|: >>> default_lm: roberta-base
21:40:57 INFO train(217):_print_args|: >>> run_id: 0
21:40:57 INFO train(217):_print_args|: >>> coref_mode_in_training: ignore
21:40:57 INFO train(217):_print_args|: >>> base_path: /home/moritz/Documents/Hiwi/NewsMTSC
/home/moritz/anaconda3/envs/newsmtsc/lib/python3.7/site-packages/transformers/tokenization_utils_base.py:2110: FutureWarning: The `pad_to_max_length` argument is deprecated and will be removed in a future version, use `padding=True` or `padding='longest'` to pad to the longest sequence in the batch, or use `padding='max_length'` to pad to a max length. In this case, you can give a specific length with `max_length` (e.g. `max_length=45`) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert).
  FutureWarning,
21:40:57 WARNING dataset(194):_create_word_to_wordpiece_mapping|: overlap when mapping tokens to wordpiece (allow overwriting because Roberta is used)
Traceback (most recent call last):
  File "/snap/pycharm-educational/38/plugins/python-ce/helpers/pydev/pydevd.py", line 1483, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/snap/pycharm-educational/38/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/moritz/Documents/Hiwi/NewsMTSC/infer.py", line 155, in <module>
    text_right=", you have to admit that he’s an astute reader of politics.",
  File "/home/moritz/Documents/Hiwi/NewsMTSC/infer.py", line 88, in infer
    outputs = self.model(inputs)
  File "/home/moritz/anaconda3/envs/newsmtsc/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/moritz/Documents/Hiwi/NewsMTSC/models/singletarget/grutscsingle.py", line 132, in forward
    (last_hidden_states, knowledge_embedded), dim=2
TypeError: expected Tensor as element 0 in argument 0, but got str

last_hidden_states has indeed the string value “last_hidden_states” (i.e. last_hidden_states = “last_hidden_states”) after the statement in

NewsMTSC/models/singletarget/grutscsingle.py

Lines 102 to 106 in aaa358b

    
           last_hidden_states = self.invoke_language_model( 
        
               lm=self.language_model, 
        
               input_ids=text_target_bert_indices, 
        
               token_type_ids=text_target_bert_segments_ids, 
        
           )

@movabo this should be fixed when you're using the latest repo. if not, pls reopen

Unfortunately it seems that this problem persists.
I just tried the newest version of the repository. The only changes were the fixed imports (of bert_modeling).

did you delete the conda environment and create a new one using the (updated) instructions from the (updated) readme?
if yes, does running the infer.py as is in the repo (without any changes) work? if not, pls post the stacktrace

Ah, I did not see that you also changed the required PyTorch version. With 1.7.1 it seems to work! 👍

	last_hidden_states = self.invoke_language_model(
	lm=self.language_model,
	input_ids=text_target_bert_indices,
	token_type_ids=text_target_bert_segments_ids,
	)