sacdallago/bio_embeddings

from where comes the models in "bio_embeddings/utilities /defaults.yml", where is docs, parameters, dataset ?

laraque opened this issue · 0 comments

Hello Team,

where i can find information about how was trained the models published from the repository linked from the file:

file: bio_embeddings/utilities /defaults.yml
model : http://data.bioembeddings.com/public/embeddings/embedding_models/word2vec/word2vec.model

For instance, which where the parameters to train the Word2vec model ? it was used the CBOW or skip-gramm methodology ?
Which dataset was used ?

In need to use different vector embedding size, but the word2vec model is fixed to 512 embedding size, even if i change this parameter in the corresponding embedding pipeline to 24 for instace, i got the error.

File "/bio_embeddings/embed/word2vec_embedder.py", line 48, in embed
embedding[index, :] = self._get_kmer_representation(k_mer)
ValueError: could not broadcast input array from shape (512,) into shape (24,)

Thank for your comments,