Restoring ELECTRA-Small checkpoint into HuggingFace transformers model doesn't work properly
DevKretov opened this issue · 4 comments
Environment info
transformers
version: 3.1.0- Platform: Colab
- Python version: 3.6.9
- PyTorch version (GPU?): default Colab
- Tensorflow version (GPU?): default Colab (checkpoint from 1.15)
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help
Seems like nobody
Information
Hello, I am trying to load electra-small checkpoint model from Google Research (https://github.com/google-research/electra) into HuggingFace's ElectraForMaskedLM object. There were several different ways I tried to achieve that:
- Converted the checkpoint with the help of cli convert_electra_original_tf_checkpoint_to_pytorch.py file
- Converted the checkpoint with the help of .from_pretrained() method with the config.json provided here: https://s3.amazonaws.com/models.huggingface.co/bert/google/electra-small-generator/config.json
Both worked without any exceptions. The first one didn't write anything to the output except for the contents of config.json file and the path the model would be saved to. The second one writes lots of information about skipping several variables and initialising others:
Initialize PyTorch weight ['discriminator_predictions', 'dense', 'bias'] discriminator_predictions/dense/bias Initialize PyTorch weight ['discriminator_predictions', 'dense', 'kernel'] discriminator_predictions/dense/kernel Initialize PyTorch weight ['discriminator_predictions', 'dense_prediction', 'bias'] discriminator_predictions/dense_1/bias Initialize PyTorch weight ['discriminator_predictions', 'dense_prediction', 'kernel'] discriminator_predictions/dense_1/kernel Initialize PyTorch weight ['electra', 'embeddings', 'LayerNorm', 'beta'] electra/embeddings/LayerNorm/beta Initialize PyTorch weight ['electra', 'embeddings', 'LayerNorm', 'gamma'] electra/embeddings/LayerNorm/gamma Initialize PyTorch weight ['electra', 'embeddings', 'position_embeddings'] electra/embeddings/position_embeddings Initialize PyTorch weight ['electra', 'embeddings', 'token_type_embeddings'] electra/embeddings/token_type_embeddings Initialize PyTorch weight ['electra', 'embeddings', 'word_embeddings'] electra/embeddings/word_embeddings Initialize PyTorch weight ['electra', 'embeddings_project', 'bias'] electra/embeddings_project/bias Initialize PyTorch weight ['electra', 'embeddings_project', 'kernel'] electra/embeddings_project/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_0/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_0/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_0/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_0/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_0/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_0/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_0/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_0/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_0/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_0/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'intermediate', 'dense', 'bias'] electra/encoder/layer_0/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_0/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_0/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_0/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'output', 'dense', 'bias'] electra/encoder/layer_0/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_0', 'output', 'dense', 'kernel'] electra/encoder/layer_0/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_1/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_1/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_1/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_1/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_1/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_1/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_1/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_1/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_1/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_1/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'intermediate', 'dense', 'bias'] electra/encoder/layer_1/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_1/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_1/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_1/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'output', 'dense', 'bias'] electra/encoder/layer_1/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_1', 'output', 'dense', 'kernel'] electra/encoder/layer_1/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_10/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_10/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_10/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_10/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_10/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_10/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_10/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_10/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_10/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_10/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'intermediate', 'dense', 'bias'] electra/encoder/layer_10/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_10/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_10/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_10/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'output', 'dense', 'bias'] electra/encoder/layer_10/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_10', 'output', 'dense', 'kernel'] electra/encoder/layer_10/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_11/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_11/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_11/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_11/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_11/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_11/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_11/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_11/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_11/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_11/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'intermediate', 'dense', 'bias'] electra/encoder/layer_11/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_11/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_11/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_11/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'output', 'dense', 'bias'] electra/encoder/layer_11/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_11', 'output', 'dense', 'kernel'] electra/encoder/layer_11/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_2/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_2/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_2/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_2/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_2/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_2/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_2/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_2/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_2/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_2/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'intermediate', 'dense', 'bias'] electra/encoder/layer_2/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_2/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_2/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_2/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'output', 'dense', 'bias'] electra/encoder/layer_2/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_2', 'output', 'dense', 'kernel'] electra/encoder/layer_2/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_3/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_3/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_3/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_3/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_3/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_3/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_3/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_3/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_3/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_3/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'intermediate', 'dense', 'bias'] electra/encoder/layer_3/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_3/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_3/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_3/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'output', 'dense', 'bias'] electra/encoder/layer_3/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_3', 'output', 'dense', 'kernel'] electra/encoder/layer_3/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_4/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_4/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_4/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_4/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_4/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_4/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_4/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_4/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_4/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_4/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'intermediate', 'dense', 'bias'] electra/encoder/layer_4/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_4/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_4/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_4/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'output', 'dense', 'bias'] electra/encoder/layer_4/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_4', 'output', 'dense', 'kernel'] electra/encoder/layer_4/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_5/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_5/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_5/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_5/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_5/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_5/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_5/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_5/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_5/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_5/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'intermediate', 'dense', 'bias'] electra/encoder/layer_5/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_5/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_5/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_5/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'output', 'dense', 'bias'] electra/encoder/layer_5/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_5', 'output', 'dense', 'kernel'] electra/encoder/layer_5/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_6/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_6/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_6/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_6/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_6/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_6/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_6/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_6/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_6/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_6/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'intermediate', 'dense', 'bias'] electra/encoder/layer_6/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_6/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_6/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_6/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'output', 'dense', 'bias'] electra/encoder/layer_6/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_6', 'output', 'dense', 'kernel'] electra/encoder/layer_6/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_7/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_7/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_7/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_7/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_7/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_7/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_7/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_7/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_7/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_7/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'intermediate', 'dense', 'bias'] electra/encoder/layer_7/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_7/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_7/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_7/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'output', 'dense', 'bias'] electra/encoder/layer_7/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_7', 'output', 'dense', 'kernel'] electra/encoder/layer_7/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_8/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_8/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_8/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_8/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_8/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_8/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_8/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_8/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_8/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_8/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'intermediate', 'dense', 'bias'] electra/encoder/layer_8/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_8/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_8/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_8/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'output', 'dense', 'bias'] electra/encoder/layer_8/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_8', 'output', 'dense', 'kernel'] electra/encoder/layer_8/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_9/attention/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_9/attention/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'output', 'dense', 'bias'] electra/encoder/layer_9/attention/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'output', 'dense', 'kernel'] electra/encoder/layer_9/attention/output/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'self', 'key', 'bias'] electra/encoder/layer_9/attention/self/key/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'self', 'key', 'kernel'] electra/encoder/layer_9/attention/self/key/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'self', 'query', 'bias'] electra/encoder/layer_9/attention/self/query/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'self', 'query', 'kernel'] electra/encoder/layer_9/attention/self/query/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'self', 'value', 'bias'] electra/encoder/layer_9/attention/self/value/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'attention', 'self', 'value', 'kernel'] electra/encoder/layer_9/attention/self/value/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'intermediate', 'dense', 'bias'] electra/encoder/layer_9/intermediate/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'intermediate', 'dense', 'kernel'] electra/encoder/layer_9/intermediate/dense/kernel Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'output', 'LayerNorm', 'beta'] electra/encoder/layer_9/output/LayerNorm/beta Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'output', 'LayerNorm', 'gamma'] electra/encoder/layer_9/output/LayerNorm/gamma Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'output', 'dense', 'bias'] electra/encoder/layer_9/output/dense/bias Initialize PyTorch weight ['electra', 'encoder', 'layer_9', 'output', 'dense', 'kernel'] electra/encoder/layer_9/output/dense/kernel Skipping generator/embeddings_project/bias ['generator', 'embeddings_project', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/embeddings_project/kernel ['generator', 'embeddings_project', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_0', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_0', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/output/dense/bias ['generator', 'encoder', 'layer_0', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/output/dense/kernel ['generator', 'encoder', 'layer_0', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/self/key/bias ['generator', 'encoder', 'layer_0', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/self/key/kernel ['generator', 'encoder', 'layer_0', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/self/query/bias ['generator', 'encoder', 'layer_0', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/self/query/kernel ['generator', 'encoder', 'layer_0', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/self/value/bias ['generator', 'encoder', 'layer_0', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/attention/self/value/kernel ['generator', 'encoder', 'layer_0', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/intermediate/dense/bias ['generator', 'encoder', 'layer_0', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/intermediate/dense/kernel ['generator', 'encoder', 'layer_0', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/output/LayerNorm/beta ['generator', 'encoder', 'layer_0', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/output/LayerNorm/gamma ['generator', 'encoder', 'layer_0', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/output/dense/bias ['generator', 'encoder', 'layer_0', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_0/output/dense/kernel ['generator', 'encoder', 'layer_0', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_1', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_1', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/output/dense/bias ['generator', 'encoder', 'layer_1', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/output/dense/kernel ['generator', 'encoder', 'layer_1', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/self/key/bias ['generator', 'encoder', 'layer_1', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/self/key/kernel ['generator', 'encoder', 'layer_1', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/self/query/bias ['generator', 'encoder', 'layer_1', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/self/query/kernel ['generator', 'encoder', 'layer_1', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/self/value/bias ['generator', 'encoder', 'layer_1', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/attention/self/value/kernel ['generator', 'encoder', 'layer_1', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/intermediate/dense/bias ['generator', 'encoder', 'layer_1', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/intermediate/dense/kernel ['generator', 'encoder', 'layer_1', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/output/LayerNorm/beta ['generator', 'encoder', 'layer_1', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/output/LayerNorm/gamma ['generator', 'encoder', 'layer_1', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/output/dense/bias ['generator', 'encoder', 'layer_1', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_1/output/dense/kernel ['generator', 'encoder', 'layer_1', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_10', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_10', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/output/dense/bias ['generator', 'encoder', 'layer_10', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/output/dense/kernel ['generator', 'encoder', 'layer_10', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/self/key/bias ['generator', 'encoder', 'layer_10', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/self/key/kernel ['generator', 'encoder', 'layer_10', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/self/query/bias ['generator', 'encoder', 'layer_10', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/self/query/kernel ['generator', 'encoder', 'layer_10', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/self/value/bias ['generator', 'encoder', 'layer_10', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/attention/self/value/kernel ['generator', 'encoder', 'layer_10', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/intermediate/dense/bias ['generator', 'encoder', 'layer_10', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/intermediate/dense/kernel ['generator', 'encoder', 'layer_10', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/output/LayerNorm/beta ['generator', 'encoder', 'layer_10', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/output/LayerNorm/gamma ['generator', 'encoder', 'layer_10', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/output/dense/bias ['generator', 'encoder', 'layer_10', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_10/output/dense/kernel ['generator', 'encoder', 'layer_10', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_11', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_11', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/output/dense/bias ['generator', 'encoder', 'layer_11', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/output/dense/kernel ['generator', 'encoder', 'layer_11', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/self/key/bias ['generator', 'encoder', 'layer_11', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/self/key/kernel ['generator', 'encoder', 'layer_11', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/self/query/bias ['generator', 'encoder', 'layer_11', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/self/query/kernel ['generator', 'encoder', 'layer_11', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/self/value/bias ['generator', 'encoder', 'layer_11', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/attention/self/value/kernel ['generator', 'encoder', 'layer_11', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/intermediate/dense/bias ['generator', 'encoder', 'layer_11', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/intermediate/dense/kernel ['generator', 'encoder', 'layer_11', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/output/LayerNorm/beta ['generator', 'encoder', 'layer_11', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/output/LayerNorm/gamma ['generator', 'encoder', 'layer_11', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/output/dense/bias ['generator', 'encoder', 'layer_11', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_11/output/dense/kernel ['generator', 'encoder', 'layer_11', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_2', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_2', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/output/dense/bias ['generator', 'encoder', 'layer_2', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/output/dense/kernel ['generator', 'encoder', 'layer_2', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/self/key/bias ['generator', 'encoder', 'layer_2', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/self/key/kernel ['generator', 'encoder', 'layer_2', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/self/query/bias ['generator', 'encoder', 'layer_2', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/self/query/kernel ['generator', 'encoder', 'layer_2', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/self/value/bias ['generator', 'encoder', 'layer_2', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/attention/self/value/kernel ['generator', 'encoder', 'layer_2', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/intermediate/dense/bias ['generator', 'encoder', 'layer_2', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/intermediate/dense/kernel ['generator', 'encoder', 'layer_2', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/output/LayerNorm/beta ['generator', 'encoder', 'layer_2', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/output/LayerNorm/gamma ['generator', 'encoder', 'layer_2', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/output/dense/bias ['generator', 'encoder', 'layer_2', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_2/output/dense/kernel ['generator', 'encoder', 'layer_2', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_3', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_3', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/output/dense/bias ['generator', 'encoder', 'layer_3', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/output/dense/kernel ['generator', 'encoder', 'layer_3', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/self/key/bias ['generator', 'encoder', 'layer_3', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/self/key/kernel ['generator', 'encoder', 'layer_3', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/self/query/bias ['generator', 'encoder', 'layer_3', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/self/query/kernel ['generator', 'encoder', 'layer_3', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/self/value/bias ['generator', 'encoder', 'layer_3', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/attention/self/value/kernel ['generator', 'encoder', 'layer_3', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/intermediate/dense/bias ['generator', 'encoder', 'layer_3', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/intermediate/dense/kernel ['generator', 'encoder', 'layer_3', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/output/LayerNorm/beta ['generator', 'encoder', 'layer_3', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/output/LayerNorm/gamma ['generator', 'encoder', 'layer_3', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/output/dense/bias ['generator', 'encoder', 'layer_3', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_3/output/dense/kernel ['generator', 'encoder', 'layer_3', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_4', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_4', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/output/dense/bias ['generator', 'encoder', 'layer_4', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/output/dense/kernel ['generator', 'encoder', 'layer_4', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/self/key/bias ['generator', 'encoder', 'layer_4', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/self/key/kernel ['generator', 'encoder', 'layer_4', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/self/query/bias ['generator', 'encoder', 'layer_4', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/self/query/kernel ['generator', 'encoder', 'layer_4', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/self/value/bias ['generator', 'encoder', 'layer_4', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/attention/self/value/kernel ['generator', 'encoder', 'layer_4', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/intermediate/dense/bias ['generator', 'encoder', 'layer_4', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/intermediate/dense/kernel ['generator', 'encoder', 'layer_4', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/output/LayerNorm/beta ['generator', 'encoder', 'layer_4', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/output/LayerNorm/gamma ['generator', 'encoder', 'layer_4', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/output/dense/bias ['generator', 'encoder', 'layer_4', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_4/output/dense/kernel ['generator', 'encoder', 'layer_4', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_5', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_5', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/output/dense/bias ['generator', 'encoder', 'layer_5', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/output/dense/kernel ['generator', 'encoder', 'layer_5', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/self/key/bias ['generator', 'encoder', 'layer_5', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/self/key/kernel ['generator', 'encoder', 'layer_5', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/self/query/bias ['generator', 'encoder', 'layer_5', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/self/query/kernel ['generator', 'encoder', 'layer_5', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/self/value/bias ['generator', 'encoder', 'layer_5', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/attention/self/value/kernel ['generator', 'encoder', 'layer_5', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/intermediate/dense/bias ['generator', 'encoder', 'layer_5', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/intermediate/dense/kernel ['generator', 'encoder', 'layer_5', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/output/LayerNorm/beta ['generator', 'encoder', 'layer_5', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/output/LayerNorm/gamma ['generator', 'encoder', 'layer_5', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/output/dense/bias ['generator', 'encoder', 'layer_5', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_5/output/dense/kernel ['generator', 'encoder', 'layer_5', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_6', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_6', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/output/dense/bias ['generator', 'encoder', 'layer_6', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/output/dense/kernel ['generator', 'encoder', 'layer_6', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/self/key/bias ['generator', 'encoder', 'layer_6', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/self/key/kernel ['generator', 'encoder', 'layer_6', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/self/query/bias ['generator', 'encoder', 'layer_6', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/self/query/kernel ['generator', 'encoder', 'layer_6', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/self/value/bias ['generator', 'encoder', 'layer_6', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/attention/self/value/kernel ['generator', 'encoder', 'layer_6', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/intermediate/dense/bias ['generator', 'encoder', 'layer_6', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/intermediate/dense/kernel ['generator', 'encoder', 'layer_6', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/output/LayerNorm/beta ['generator', 'encoder', 'layer_6', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/output/LayerNorm/gamma ['generator', 'encoder', 'layer_6', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/output/dense/bias ['generator', 'encoder', 'layer_6', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_6/output/dense/kernel ['generator', 'encoder', 'layer_6', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_7', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_7', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/output/dense/bias ['generator', 'encoder', 'layer_7', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/output/dense/kernel ['generator', 'encoder', 'layer_7', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/self/key/bias ['generator', 'encoder', 'layer_7', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/self/key/kernel ['generator', 'encoder', 'layer_7', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/self/query/bias ['generator', 'encoder', 'layer_7', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/self/query/kernel ['generator', 'encoder', 'layer_7', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/self/value/bias ['generator', 'encoder', 'layer_7', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/attention/self/value/kernel ['generator', 'encoder', 'layer_7', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/intermediate/dense/bias ['generator', 'encoder', 'layer_7', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/intermediate/dense/kernel ['generator', 'encoder', 'layer_7', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/output/LayerNorm/beta ['generator', 'encoder', 'layer_7', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/output/LayerNorm/gamma ['generator', 'encoder', 'layer_7', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/output/dense/bias ['generator', 'encoder', 'layer_7', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_7/output/dense/kernel ['generator', 'encoder', 'layer_7', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_8', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_8', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/output/dense/bias ['generator', 'encoder', 'layer_8', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/output/dense/kernel ['generator', 'encoder', 'layer_8', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/self/key/bias ['generator', 'encoder', 'layer_8', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/self/key/kernel ['generator', 'encoder', 'layer_8', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/self/query/bias ['generator', 'encoder', 'layer_8', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/self/query/kernel ['generator', 'encoder', 'layer_8', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/self/value/bias ['generator', 'encoder', 'layer_8', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/attention/self/value/kernel ['generator', 'encoder', 'layer_8', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/intermediate/dense/bias ['generator', 'encoder', 'layer_8', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/intermediate/dense/kernel ['generator', 'encoder', 'layer_8', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/output/LayerNorm/beta ['generator', 'encoder', 'layer_8', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/output/LayerNorm/gamma ['generator', 'encoder', 'layer_8', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/output/dense/bias ['generator', 'encoder', 'layer_8', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_8/output/dense/kernel ['generator', 'encoder', 'layer_8', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/output/LayerNorm/beta ['generator', 'encoder', 'layer_9', 'attention', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/output/LayerNorm/gamma ['generator', 'encoder', 'layer_9', 'attention', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/output/dense/bias ['generator', 'encoder', 'layer_9', 'attention', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/output/dense/kernel ['generator', 'encoder', 'layer_9', 'attention', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/self/key/bias ['generator', 'encoder', 'layer_9', 'attention', 'self', 'key', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/self/key/kernel ['generator', 'encoder', 'layer_9', 'attention', 'self', 'key', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/self/query/bias ['generator', 'encoder', 'layer_9', 'attention', 'self', 'query', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/self/query/kernel ['generator', 'encoder', 'layer_9', 'attention', 'self', 'query', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/self/value/bias ['generator', 'encoder', 'layer_9', 'attention', 'self', 'value', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/attention/self/value/kernel ['generator', 'encoder', 'layer_9', 'attention', 'self', 'value', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/intermediate/dense/bias ['generator', 'encoder', 'layer_9', 'intermediate', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/intermediate/dense/kernel ['generator', 'encoder', 'layer_9', 'intermediate', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/output/LayerNorm/beta ['generator', 'encoder', 'layer_9', 'output', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/output/LayerNorm/gamma ['generator', 'encoder', 'layer_9', 'output', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/output/dense/bias ['generator', 'encoder', 'layer_9', 'output', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator/encoder/layer_9/output/dense/kernel ['generator', 'encoder', 'layer_9', 'output', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator' Skipping generator_predictions/LayerNorm/beta ['generator_predictions', 'LayerNorm', 'beta'] 'ElectraForPreTraining' object has no attribute 'generator_predictions' Skipping generator_predictions/LayerNorm/gamma ['generator_predictions', 'LayerNorm', 'gamma'] 'ElectraForPreTraining' object has no attribute 'generator_predictions' Skipping generator_predictions/dense/bias ['generator_predictions', 'dense', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator_predictions' Skipping generator_predictions/dense/kernel ['generator_predictions', 'dense', 'kernel'] 'ElectraForPreTraining' object has no attribute 'generator_predictions' Skipping generator_predictions/output_bias ['generator_lm_head', 'bias'] 'ElectraForPreTraining' object has no attribute 'generator_lm_head'
It seems like OK since the Google's checkpoint consists of both generator and discriminator. However, as soon as I try to make some prediction (e.g. "I love reading [MASK]."), the top-5 most likely words is:
- ᵃ
- fulfilled
- sal
- 1809
- drank
which is pretty random, I guess.
On the other hand, as soon as I initialise the ElectraForMaskedLM model directly from https://huggingface.co/google/electra-small-generator , everything works fantastically!
So my hypothesis is, that there is a bug in checkpoint translation to HF format. Can anybody tell me how I can load my own checkpoint (or at least that Google's to check if the whole thing works correctly)?
To reproduce
Steps to reproduce the behavior:
- Download the official ELECTRA-small checkpoint
- Try to run CLI script to convert the TF checkpoint to HF .bin model
- Run classical prediction and see top-5 words (OR import HF pipeline and run it in "fill-mask" mode)
You will see that the model from HF web works correctly whereas the model from Google's GitHub gives random tokens.
Expected behavior
I expected to see that the model is capable of making basic predictions, so that I know that it has been restored and reformatted correctly.
Oh my god I love you! I was just about to run experiments on why my converted model is not working using hugging-face transformers, and was worried that there was something wrong with my model and I will have to train it again!
Thanks for posting this, know I am sure the fault is of the conversion script!!! I would suggest we open an issue on the transformers repository regarding this.
Thanks!!
I've seen this error before and for my models I had to change the config.json
:
You need to set "intermediate_size": 256
instead of "intermediate_size": 1024
:)
@LysandreJik gave me that hint 🤗
I got excited too soon. My models are not working even with this repositories fine tuning code.
I tried it on multiple classification tasks, but my model just predicts the same label again and again. Anyone faced this or similar issue?
This issue was answered on the transformers repo: huggingface/transformers#6945