How to dump_token_embeddings?
Closed this issue · 1 comments
I have trained a model based on token input, and delete the 'char_cnn' parameters from options. Now, I want to extract features from trained ELMO, but I can't produce the 'embedding_weight_file' which is necessary for 'BidirectionalLanguageModel' init when its 'use_character_inputs' is False (because I have delete 'char_cnn', and haven't used char_embedding in my training graph)
I try to use function 'dump_token_embeddings()' to create 'embedding_weight_file', but internal of this function, it still need the 'char_cnn' parameters.
`
def dump_token_embeddings(vocab_file, options_file, weight_file, outfile):
'''Given an input vocabulary file, dump all the token embeddings to the
outfile. The result can be used as the embedding_weight_file when
constructing a BidirectionalLanguageModel.'''
with open(options_file, 'r') as fin:
options = json.load(fin)
max_word_length = options['char_cnn']['max_characters_per_token']
vocab = UnicodeCharsVocabulary(vocab_file, max_word_length)
batcher = Batcher(vocab_file, max_word_length)
ids_placeholder = tf.placeholder('int32', shape=(None, None, max_word_length))
model = BidirectionalLanguageModel(options_file, weight_file)
...
`
Does this function just write for the model with char_embedding?
And if I want to dump token embeddings, how do I need to modify this function?
- change UnicodeCharsVocabulary to Vocabulary
- the Batcher to TokenBatcher
- placeholder
- the BidirectionalLanguageModel
But if I cerate the BidirectionalLanguageModel without 'char_cnn', it will need 'embedding_weight_file' not None... This is deadlocked
OK, I have fixed this problem, I have modify codes of BidirectionalLanguageModel
`
comment blow codes
# if not use_character_inputs:
# if embedding_weight_file is None:
# raise ValueError(
# "embedding_weight_file is required input with "
# "not use_character_inputs"
# )
and change file reading in
token_embedding_file -> weight_file
def _pretrained_initializer(varname, weight_file, embedding_weight_file=None):
...
if varname_in_file == 'embedding':
with h5py.File(weight_file, 'r') as fin:
`
and modify BidirectionalLanguageModelGraph
`
previews under else is None
if embedding_weight_file is not None:
...
else:
# +1 for padding
self._n_tokens_vocab = options['n_tokens_vocab'] + 1
`