getalp/Flaubert

fast: fastBPE/fastBPE.hpp:458: void fastBPE::readCodes(const char*, std::unordered_map<std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char>

keloemma opened this issue · 0 comments

Hello, I am trying to learn BPE codes on my traning set ans I am getting this error :

fast: fastBPE/fastBPE.hpp:458: void fastBPE::readCodes(const char*, std::unordered_map<std::pair<std::__cxx11::basic_string, std::__cxx11::basic_string >, unsigned int, fastBPE::pair_hash>&, std::unordered_map<std::__cxx11::basic_string, std::pair<std::__cxx11::basic_string, std::__cxx11::basic_string > >&): Assertion codes.find(pair) == codes.end()' failed. tools/create_pretraining_data.sh : ligne 38 : 5617 Abandon $FASTBPE applybpe $OUT_PATH/train.$lg $DATA_DIR/$lg.train $OUT_PATH/codes Loading codes from /home/getalp/kelodjoe/eXP/Flaubert/data/processed/fr_corpuslabel/BPE/10k/codes ... fast: fastBPE/fastBPE.hpp:458: void fastBPE::readCodes(const char*, std::unordered_map<std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >, unsigned int, fastBPE::pair_hash>&, std::unordered_map<std::__cxx11::basic_string<char>, std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> > >&): Assertion codes.find(pair) == codes.end()' failed.
tools/create_pretraining_data.sh : ligne 39 : 5618 Abandon $FASTBPE applybpe $OUT_PATH/valid.$lg $DATA_DIR/$lg.valid $OUT_PATH/codes
Loading codes from /home/getalp/kelodjoe/eXP/Flaubert/data/processed/fr_corpuslabel/BPE/10k/codes ...
fast: fastBPE/fastBPE.hpp:458: void fastBPE::readCodes(const char*, std::unordered_map<std::pair<std::__cxx11::basic_string, std::__cxx11::basic_string >, unsigned int, fastBPE::pair_hash>&, std::unordered_map<std::__cxx11::basic_string, std::pair<std::__cxx11::basic_string, std::__cxx11::basic_string > >&): Assertion `codes.find(pair) == codes.end()' failed.
tools/create_pretraining_data.sh : ligne 40 : 5619 Abandon $FASTBPE applybpe $OUT_PATH/test.$lg $DATA_DIR/$lg.test $OUT_PATH/codes
cat: /home/getalp/kelodjoe/eXP/Flaubert/data/processed/fr_corpuslabel/BPE/10k/train.fr: Aucun fichier ou dossier de ce type
Read 0 words (0 unique) from text file.
Traceback (most recent call last):
File "preprocess.py", line 30, in
assert os.path.isfile(txt_path)
AssertionError
Traceback (most recent call last):
File "preprocess.py", line 30, in
assert os.path.isfile(txt_path)
AssertionError
Traceback (most recent call last):
File "preprocess.py", line 30, in
assert os.path.isfile(txt_path)
AssertionError

Line 30 of preprocess.py look like this :

image

Do you have any idea how can I resolve it ?