AmericasNLP/americasnlp2021

Missing dev data?

Closed this issue · 1 comments

After going through the readme for the baseline system for Spanish-Nahuatl, I get:

fran@ipek:~/source/americasnlp2021/baseline_system$ ./run_baseline_system.sh nah ../data/nahuatl-spanish/ . 5
################ Training SentencePiece tokenizer ################
sentencepiece_trainer.cc(75) LOG(INFO) Starts training with : 
trainer_spec {
  input: ../data/nahuatl-spanish//train.es
  input: ../data/nahuatl-spanish//train.nah
  input_format: 
  model_prefix: ./models/nah_es/sentencepiece.bpe
  model_type: BPE
  vocab_size: 3557
  accept_language: es
  accept_language: nah
  self_test_sample_size: 0
  character_coverage: 1
  input_sentence_size: 0
  shuffle_input_sentence: 1
  seed_sentencepiece_size: 1000000
  shrinking_factor: 0.75
  max_sentence_length: 4192
  num_threads: 16
  num_sub_iterations: 2
  max_sentencepiece_length: 16
  split_by_unicode_script: 1
  split_by_number: 1
  split_by_whitespace: 1
  split_digits: 0
  treat_whitespace_as_suffix: 0
  required_chars: 
  byte_fallback: 0
  vocabulary_output_piece_score: 1
  train_extremely_large_corpus: 0
  hard_vocab_limit: 1
  use_all_vocab: 0
  unk_id: 0
  bos_id: 1
  eos_id: 2
  pad_id: -1
  unk_piece: <unk>
  bos_piece: <s>
  eos_piece: </s>
  pad_piece: <pad>
  unk_surface:  ⁇ 
}
normalizer_spec {
  name: nmt_nfkc
  add_dummy_prefix: 1
  remove_extra_whitespaces: 1
  escape_whitespaces: 1
  normalization_rule_tsv: 
}
denormalizer_spec {}
trainer_interface.cc(330) LOG(INFO) SentenceIterator is not specified. Using MultiFileSentenceIterator.
trainer_interface.cc(185) LOG(INFO) Loading corpus: ../data/nahuatl-spanish//train.es
trainer_interface.cc(357) LOG(WARNING) Found too long line (5137 > 4192).
trainer_interface.cc(359) LOG(WARNING) Too long lines are skipped in the training.
trainer_interface.cc(360) LOG(WARNING) The maximum length can be changed with --max_sentence_length=<size> flag.
trainer_interface.cc(185) LOG(INFO) Loading corpus: ../data/nahuatl-spanish//train.nah
trainer_interface.cc(386) LOG(INFO) Loaded all 32105 sentences
trainer_interface.cc(392) LOG(INFO) Skipped 20 too long sentences.
trainer_interface.cc(401) LOG(INFO) Adding meta_piece: <unk>
trainer_interface.cc(401) LOG(INFO) Adding meta_piece: <s>
trainer_interface.cc(401) LOG(INFO) Adding meta_piece: </s>
trainer_interface.cc(406) LOG(INFO) Normalizing sentences...
trainer_interface.cc(467) LOG(INFO) all chars count=4552492
trainer_interface.cc(488) LOG(INFO) Alphabet size=142
trainer_interface.cc(489) LOG(INFO) Final character coverage=1
trainer_interface.cc(521) LOG(INFO) Done! preprocessed 32105 sentences.
trainer_interface.cc(527) LOG(INFO) Tokenizing input sentences with whitespace: 32105
trainer_interface.cc(537) LOG(INFO) Done! 79331
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=77400 min_freq=33
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=29907 size=20 all=3171 active=2084 piece=ch
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=17897 size=40 all=4381 active=3294 piece=as
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=10792 size=60 all=6028 active=4941 piece=▁la
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=8091 size=80 all=7365 active=6278 piece=▁los
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=5905 size=100 all=8967 active=7880 piece=ua
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=5795 min_freq=470
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=4562 size=120 all=10507 active=2407 piece=▁ma
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=3632 size=140 all=11823 active=3723 piece=▁me
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=3174 size=160 all=12935 active=4835 piece=▁del
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=2682 size=180 all=14391 active=6291 piece=ton
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=2306 size=200 all=15737 active=7637 piece=dad
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=2296 min_freq=397
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=2032 size=220 all=16802 active=2023 piece=▁man
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=1856 size=240 all=17915 active=3136 piece=pil
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=1699 size=260 all=18899 active=4120 piece=▁cas
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=1556 size=280 all=19919 active=5140 piece=mp
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=1430 size=300 all=20990 active=6211 piece=▁na
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=1429 min_freq=328
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=1305 size=320 all=22042 active=2055 piece=▁ten
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=1227 size=340 all=22990 active=3003 piece=▁nos
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=1166 size=360 all=24022 active=4035 piece=yotl
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=1079 size=380 all=24898 active=4911 piece=tza
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=1042 size=400 all=26219 active=6232 piece=teca
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=1039 min_freq=240
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=991 size=420 all=27252 active=2248 piece=meh
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=923 size=440 all=27973 active=2969 piece=miqui
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=862 size=460 all=28625 active=3621 piece=▁tlaca
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=819 size=480 all=29636 active=4632 piece=▁san
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=787 size=500 all=30328 active=5324 piece=▁mar
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=785 min_freq=188
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=736 size=520 all=31500 active=2658 piece=ones
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=707 size=540 all=32429 active=3587 piece=▁inic
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=688 size=560 all=33116 active=4274 piece=ul
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=652 size=580 all=33882 active=5040 piece=▁Ca
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=636 size=600 all=34473 active=5631 piece=patl
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=634 min_freq=157
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=608 size=620 all=35588 active=2830 piece=ún
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=589 size=640 all=36410 active=3652 piece=▁fueron
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=571 size=660 all=36838 active=4080 piece=▁todo
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=550 size=680 all=37521 active=4763 piece=tis
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=535 size=700 all=38156 active=5398 piece=ko
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=535 min_freq=132
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=516 size=720 all=38808 active=2440 piece=ido
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=502 size=740 all=39613 active=3245 piece=▁za
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=494 size=760 all=40100 active=3732 piece=▁Ama
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=480 size=780 all=40724 active=4356 piece=▁160
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=456 size=800 all=41144 active=4776 piece=ni
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=455 min_freq=118
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=438 size=820 all=41773 active=2612 piece=ño
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=424 size=840 all=42313 active=3152 piece=▁vis
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=412 size=860 all=42825 active=3664 piece=▁tona
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=397 size=880 all=43537 active=4376 piece=▁día
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=388 size=900 all=44204 active=5043 piece=ini
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=386 min_freq=103
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=374 size=920 all=44770 active=2694 piece=tiaya
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=363 size=940 all=45507 active=3431 piece=▁dicha
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=353 size=960 all=46190 active=4114 piece=▁Los
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=343 size=980 all=46668 active=4591 piece=tlahtoca
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=334 size=1000 all=47460 active=5383 piece=▁prim
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=333 min_freq=92
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=328 size=1020 all=47982 active=2894 piece=▁flores
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=318 size=1040 all=48652 active=3564 piece=▁ba
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=312 size=1060 all=49311 active=4223 piece=tque
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=306 size=1080 all=49623 active=4535 piece=gún
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=300 size=1100 all=50133 active=5045 piece=elig
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=300 min_freq=83
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=294 size=1120 all=50587 active=2954 piece=oy
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=289 size=1140 all=50836 active=3203 piece=tecas
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=284 size=1160 all=51160 active=3527 piece=▁corazón
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=279 size=1180 all=51814 active=4181 piece=ío
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=274 size=1200 all=52138 active=4505 piece=pacho
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=274 min_freq=77
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=268 size=1220 all=52547 active=2940 piece=cuil
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=263 size=1240 all=53012 active=3405 piece=▁yh
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=258 size=1260 all=53432 active=3825 piece=▁tú
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=254 size=1280 all=53802 active=4195 piece=pohualxihuitl
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=248 size=1300 all=54188 active=4581 piece=▁nepa
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=248 min_freq=72
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=242 size=1320 all=54590 active=3104 piece=▁om
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=237 size=1340 all=55078 active=3592 piece=▁1608
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=234 size=1360 all=55562 active=4076 piece=pohualli
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=229 size=1380 all=56044 active=4558 piece=▁Des
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=226 size=1400 all=56548 active=5062 piece=▁tech
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=226 min_freq=66
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=221 size=1420 all=57062 active=3285 piece=▁governador
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=215 size=1440 all=57436 active=3659 piece=cua
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=212 size=1460 all=58059 active=4282 piece=▁aquel
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=207 size=1480 all=58632 active=4855 piece=coli
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=202 size=1500 all=58951 active=5174 piece=zó
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=202 min_freq=61
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=200 size=1520 all=59465 active=3444 piece=culo
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=197 size=1540 all=59856 active=3835 piece=▁mos
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=194 size=1560 all=60173 active=4152 piece=▁159
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=192 size=1580 all=60458 active=4437 piece=▁tlamantli
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=188 size=1600 all=60831 active=4810 piece=▁Per
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=188 min_freq=57
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=184 size=1620 all=61171 active=3369 piece=jar
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=182 size=1640 all=61565 active=3763 piece=▁quer
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=179 size=1660 all=61836 active=4034 piece=huicac
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=176 size=1680 all=62188 active=4386 piece=▁pasado
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=173 size=1700 all=62686 active=4884 piece=ep
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=173 min_freq=54
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=171 size=1720 all=63186 active=3622 piece=maron
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=169 size=1740 all=63486 active=3922 piece=▁personas
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=166 size=1760 all=63825 active=4261 piece=ñore
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=165 size=1780 all=64180 active=4616 piece=tlalis
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=163 size=1800 all=64622 active=5058 piece=▁doc
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=163 min_freq=51
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=161 size=1820 all=64812 active=3415 piece=quiza
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=159 size=1840 all=65057 active=3660 piece=cuilo
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=156 size=1860 all=65391 active=3994 piece=idente
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=154 size=1880 all=65614 active=4217 piece=guna
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=151 size=1900 all=65854 active=4457 piece=rad
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=151 min_freq=49
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=150 size=1920 all=66238 active=3645 piece=▁puede
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=148 size=1940 all=66556 active=3963 piece=kat
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=146 size=1960 all=66874 active=4281 piece=▁oquichiuh
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=144 size=1980 all=67308 active=4715 piece=yolo
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=142 size=2000 all=67814 active=5221 piece=cado
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=142 min_freq=47
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=141 size=2020 all=68188 active=3744 piece=▁hombre
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=139 size=2040 all=68609 active=4165 piece=▁ello
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=137 size=2060 all=69018 active=4574 piece=▁kikua
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=135 size=2080 all=69279 active=4835 piece=macaz
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=133 size=2100 all=69485 active=5041 piece=▁cho
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=133 min_freq=44
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=131 size=2120 all=69803 active=3777 piece=▁Tlil
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=129 size=2140 all=69959 active=3933 piece=ger
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=127 size=2160 all=70310 active=4284 piece=idor
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=126 size=2180 all=70452 active=4426 piece=▁quimon
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=124 size=2200 all=70765 active=4739 piece=▁VI
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=124 min_freq=42
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=123 size=2220 all=71091 active=3861 piece=▁Las
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=122 size=2240 all=71397 active=4167 piece=▁Quen
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=121 size=2260 all=71611 active=4381 piece=▁vein
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=120 size=2280 all=71798 active=4568 piece=▁quienes
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=118 size=2300 all=72114 active=4884 piece=tolo
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=118 min_freq=40
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=117 size=2320 all=72241 active=3706 piece=uaya
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=116 size=2340 all=72636 active=4101 piece=▁clas
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=115 size=2360 all=72833 active=4298 piece=▁Chimal
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=113 size=2380 all=73121 active=4586 piece=erto
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=112 size=2400 all=73434 active=4899 piece=▁sep
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=112 min_freq=39
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=111 size=2420 all=73577 active=3811 piece=▁febrero
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=109 size=2440 all=73938 active=4172 piece=tlahuac
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=108 size=2460 all=74288 active=4522 piece=▁colo
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=107 size=2480 all=74548 active=4782 piece=▁águ
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=106 size=2500 all=74814 active=5048 piece=▁Cuix
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=106 min_freq=37
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=105 size=2520 all=75168 active=4093 piece=▁Real
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=104 size=2540 all=75275 active=4200 piece=▁Testigo
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=103 size=2560 all=75615 active=4540 piece=▁recib
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=102 size=2580 all=75930 active=4855 piece=▁tepetl
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=100 size=2600 all=76192 active=5117 piece=popo
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=100 min_freq=36
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=99 size=2620 all=76519 active=4068 piece=▁wa
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=98 size=2640 all=76744 active=4293 piece=ptla
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=97 size=2660 all=76962 active=4511 piece=iga
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=97 size=2680 all=77203 active=4752 piece=▁Señora
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=96 size=2700 all=77419 active=4968 piece=tilique
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=96 min_freq=35
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=95 size=2720 all=77700 active=4120 piece=▁tendr
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=94 size=2740 all=78014 active=4434 piece=▁ancho
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=93 size=2760 all=78218 active=4638 piece=▁estr
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=92 size=2780 all=78534 active=4954 piece=▁tzo
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=91 size=2800 all=78808 active=5228 piece=▁hacen
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=91 min_freq=34
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=90 size=2820 all=79018 active=4150 piece=onotza
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=89 size=2840 all=79263 active=4395 piece=▁hemos
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=88 size=2860 all=79494 active=4626 piece=mimil
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=87 size=2880 all=79700 active=4832 piece=eno
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=87 size=2900 all=79953 active=5085 piece=▁sequin
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=87 min_freq=32
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=86 size=2920 all=80142 active=4186 piece=▁ihquac
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=85 size=2940 all=80349 active=4393 piece=▁infor
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=84 size=2960 all=80522 active=4566 piece=▁cebol
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=83 size=2980 all=80720 active=4764 piece=yllo
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=83 size=3000 all=80905 active=4949 piece=▁huecauh
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=83 min_freq=31
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=82 size=3020 all=81081 active=4220 piece=icanos
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=81 size=3040 all=81257 active=4396 piece=▁tia
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=81 size=3060 all=81424 active=4563 piece=▁gobernando
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=80 size=3080 all=81636 active=4775 piece=▁pluma
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=79 size=3100 all=81687 active=4826 piece=▁rev
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=79 min_freq=30
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=78 size=3120 all=81881 active=4270 piece=miz
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=78 size=3140 all=82253 active=4642 piece=tequiuh
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=77 size=3160 all=82465 active=4854 piece=coton
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=76 size=3180 all=82706 active=5095 piece=huah
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=76 size=3200 all=82902 active=5291 piece=▁general
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=76 min_freq=29
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=75 size=3220 all=83198 active=4442 piece=ándo
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=75 size=3240 all=83314 active=4558 piece=▁tomaron
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=74 size=3260 all=83458 active=4702 piece=▁llamó
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=73 size=3280 all=83586 active=4830 piece=▁kua
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=73 size=3300 all=83692 active=4936 piece=palnemo
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=73 min_freq=28
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=72 size=3320 all=83880 active=4362 piece=▁Tia
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=72 size=3340 all=84051 active=4533 piece=▁capitán
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=71 size=3360 all=84189 active=4671 piece=▁peda
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=71 size=3380 all=84330 active=4812 piece=▁nochipa
bpe_model_trainer.cc(257) LOG(INFO) Added: freq=70 size=3400 all=84642 active=5124 piece=▁Ahui
bpe_model_trainer.cc(166) LOG(INFO) Updating active symbols. max_freq=70 min_freq=28
trainer_interface.cc(615) LOG(INFO) Saving model: ./models/nah_es/sentencepiece.bpe.model
trainer_interface.cc(626) LOG(INFO) Saving vocabs: ./models/nah_es/sentencepiece.bpe.vocab
################ Done training ################
################ Tokenizing data ################
Encode error: [Errno 2] No such file or directory: '../data/nahuatl-spanish//dev.es'
Encode error: [Errno 2] No such file or directory: '../data/nahuatl-spanish//dev.nah'
Encode error: [Errno 2] No such file or directory: '../data/nahuatl-spanish//test.es'
Encode error: [Errno 2] No such file or directory: '../data/nahuatl-spanish//test.nah'
################ Done tokenizing ################
################ Encoding Data ################
2021-01-02 23:18:19 | INFO | fairseq_cli.preprocess | Namespace(align_suffix=None, alignfile=None, all_gather_list_size=16384, azureml_logging=False, bf16=False, bpe=None, cpu=False, criterion='cross_entropy', dataset_impl='mmap', destdir='./data_out/nah_es', empty_cache_freq=0, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, joined_dictionary=False, log_format=None, log_interval=100, lr_scheduler='fixed', memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, model_parallel_size=1, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, only_source=False, optimizer=None, padding_factor=8, profile=False, quantization_config_path=None, reset_logging=True, scoring='bleu', seed=1, source_lang='es', srcdict='./models/nah_es/fairseq.dict', target_lang='nah', task='translation', tensorboard_logdir=None, testpref=None, tgtdict='./models/nah_es/fairseq.dict', threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, tokenizer=None, tpu=False, trainpref='./data_out/nah_es/train.bpe', user_dir=None, validpref='./data_out/nah_es/dev.bpe', wandb_project=None, workers=4)
2021-01-02 23:18:19 | INFO | fairseq_cli.preprocess | [es] Dictionary: 3558 types
2021-01-02 23:18:23 | INFO | fairseq_cli.preprocess | [es] ./data_out/nah_es/train.bpe.es: 16145 sents, 717620 tokens, 0.0% replaced by <unk>
2021-01-02 23:18:23 | INFO | fairseq_cli.preprocess | [es] Dictionary: 3558 types
Traceback (most recent call last):
  File "/home/fran/.local/bin/fairseq-preprocess", line 8, in <module>
    sys.exit(cli_main())
  File "/home/fran/.local/lib/python3.8/site-packages/fairseq_cli/preprocess.py", line 394, in cli_main
    main(args)
  File "/home/fran/.local/lib/python3.8/site-packages/fairseq_cli/preprocess.py", line 284, in main
    make_all(args.source_lang, src_dict)
  File "/home/fran/.local/lib/python3.8/site-packages/fairseq_cli/preprocess.py", line 256, in make_all
    make_dataset(
  File "/home/fran/.local/lib/python3.8/site-packages/fairseq_cli/preprocess.py", line 248, in make_dataset
    make_binary_dataset(vocab, input_prefix, output_prefix, lang, num_workers)
  File "/home/fran/.local/lib/python3.8/site-packages/fairseq_cli/preprocess.py", line 133, in make_binary_dataset
    offsets = Binarizer.find_offsets(input_file, num_workers)
  File "/home/fran/.local/lib/python3.8/site-packages/fairseq/binarizer.py", line 103, in find_offsets
    with open(PathManager.get_local_path(filename), "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: './data_out/nah_es/dev.bpe.es'

Oops, I should have read the documentation :D -- Sorry for the inbox noise.