artetxem/undreamt

Issue Urdu-English translation

ahmedraza1235 opened this issue · 0 comments

Hi Mikel!
I apply all the steps which your toolkit required in paper on urdu- english corpus. But get very poor bleu score like 0.5 or 0.9.
data Preprocessing
step 1) monolingual data on apply: tokenization, true casing and cleaning 1-50 sentence length with moses.
step 2)word embeddings with word2vec parameters epco=5, window_size=5, window_size =5 and dimension=300 then apply MUSE for alignment mapped on shared space with Vecmap.
size of my corpus is 13k. (it's enough?)
my query is this toolkit support urdu language.
and second i use parameter toolkit default.
if effect parameter on model training kindly please share.