Per-factor embedding dimensions when concatenating
eltorre opened this issue · 0 comments
Feature description
Right now, factors-dim-emb
takes a single INT. Then, in Layers::Embedding
creates a matrix where every embedding has the same dimension:
FactorEmbMatrix_
= graph_->param("factor_" + name, {numberOfFactors, dimFactorEmb}, initFunc, fixed);
Then, embedWithConcat
(and maybe data::factored_vocab
?) take this into account.
I feel like this is not too good when dealing with factors with very different vocab sizes, for example capitalization of a word (vocab size 3) vs word inflection (vocab size ~100 for some languages). This forces either a too small embedding for the second factor, or a too large embedding for the first, which seems wasteful.
Example
factors-dim-emb
should behave like dim-vocabs
when --factors-combine=concat
Comments
This seems easy enough to implement
Famous last words
I'd appreciate if somebody with a good knowledge of the codebase would gauge the size of the footgun beforehand.