from where comes the models in "bio_embeddings/utilities /defaults.yml", where is docs, parameters, dataset ?
laraque opened this issue · 0 comments
Hello Team,
where i can find information about how was trained the models published from the repository linked from the file:
file: bio_embeddings/utilities /defaults.yml
model : http://data.bioembeddings.com/public/embeddings/embedding_models/word2vec/word2vec.model
For instance, which where the parameters to train the Word2vec model ? it was used the CBOW or skip-gramm methodology ?
Which dataset was used ?
In need to use different vector embedding size, but the word2vec model is fixed to 512 embedding size, even if i change this parameter in the corresponding embedding pipeline to 24 for instace, i got the error.
File "/bio_embeddings/embed/word2vec_embedder.py", line 48, in embed
embedding[index, :] = self._get_kmer_representation(k_mer)
ValueError: could not broadcast input array from shape (512,) into shape (24,)
Thank for your comments,