Bootstrapping issue: No clear path to reproduce results
jim-kukla opened this issue · 7 comments
Thanks for sharing this experiment.
I'm trying to get it working to reproduce your results, but it seems like there's a bootstrapping problem.
- Running either script produces an error that some required resource doesn't exist in
models/
. - Assuming
insurace_qa_eval.py
is the top-level script, I've made some modifications to uncomment the "save embeddings" portion of the script, but I'm still waiting for it to finish running. - It also looks like it will next need to invoke the
__main__
block to producemodels/word2vec_100_dim.h5
in insurance_qa_embeddings.py in order to finish bootstrapping.
Is that the right approach for getting this running? If so, I'll open a PR when I've got it all working.
Did you download the dataset from here? I'm not sure which resource it could be. Could you reproduce the error message?
Yep, the word2vec_100_dim.h5
was the output of using Gensim's Word2Vec model merged with the result of training a 100-dimension EmbeddingModel
. I haven't formalized this yet, mostly I've been trying out different word embeddings to see what works. I think once something works well I will put the weight file on Github for general use.
I'd appreciate it if you wanted to open a PR for a stand-alone script. Let me know if you have more questions.
I went ahead and added the word embeddings I've been using to Github
Which word2vec output you are considering here ?
When we save gensim word2vec model we get typically following files -
outfilename, outfilename.syn1neg, outfilename.syn0.np, outfilename.syn1.np
Which one maps to ".h5" you mentioned above or word2vec_100_dim.embeddings you uploaded ?
Also couldn't see "word_embeddings.py" where you might have written something related to this.
syn0
is the equivalent of the Keras embedding layer I believe, that's what I've been using. It's really these lines:
weights = np.load('word2vec_100_dim.embeddings')
language_model = model.prediction_model.layers[2]
language_model.layers[2].set_weights([weights])
@codekansas Yes, I did have the insurance_qa_python
repo cloned and had all the data_paths set properly.
Thanks for adding those .h5
entries. I'll take a look shortly and let you know if everything's working for me.
Hi, do you mean outfilename.syn0.np = word2vec_100_dim.embeddings?
It might be different depending on your version