facebookresearch/CodeGen

Could any one help me with this error, failed to learn bpe.

dinaalaaahmed opened this issue · 3 comments

When I run the command

python -m codegen_sources.preprocessing.preprocess /home/dina/CodeGen/data/test_dataset --langs java cpp python --mode monolingual_functions --bpe_mode=fast --local=True --train_splits=1
####### Error ########
INFO - 11/18/21 08:01:43 - 0:01:18 - training bpe on /home/dina/CodeGen/data/test_dataset/cpp-java-python.sa-cl.tok.shuf.50gb...
Traceback (most recent call last):
File "/home/dina/miniconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/dina/miniconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/dina/CodeGen/codegen_sources/preprocessing/preprocess.py", line 214, in
preprocess(args)
File "/home/dina/CodeGen/codegen_sources/preprocessing/preprocess.py", line 102, in preprocess
dataset.learn_bpe(ncodes=args.ncodes, executor=cluster_train_bpe)
File "/home/dina/CodeGen/codegen_sources/preprocessing/dataset_modes/dataset_mode.py", line 589, in learn_bpe
self._learn_bpe(ncodes, executor)
File "/home/dina/CodeGen/codegen_sources/preprocessing/dataset_modes/monolingual_functions_mode.py", line 123, in _learn_bpe
job.result()
File "/home/dina/.local/lib/python3.8/site-packages/submitit/core/core.py", line 263, in result
r = self.results()
File "/home/dina/.local/lib/python3.8/site-packages/submitit/core/core.py", line 291, in results
raise job_exception # pylint: disable=raising-bad-type
submitit.core.utils.FailedJobError: Job (task=0) failed during processing with trace:

Traceback (most recent call last):
File "/home/dina/.local/lib/python3.8/site-packages/submitit/core/submission.py", line 53, in process_job
result = delayed.result()
File "/home/dina/.local/lib/python3.8/site-packages/submitit/core/utils.py", line 122, in result
self._result = self.function(*self.args, **self.kwargs)
File "/home/dina/CodeGen/codegen_sources/preprocessing/bpe_modes/fast_bpe_mode.py", line 53, in learn_bpe_file
assert (
AssertionError: failed to learn bpe on /home/dina/CodeGen/data/test_dataset/cpp-java-python.sa-cl.tok.shuf.50gb, command: /home/dina/CodeGen/codegen_sources/model/tools/fastBPE/fast learnbpe 50000 /home/dina/CodeGen/data/test_dataset/cpp-java-python.sa-cl.tok.shuf.50gb > /home/dina/CodeGen/data/test_dataset/cpp-java-python.sa-cl.codes


You can check full logs with 'job.stderr(0)' and 'job.stdout(0)'or at paths:

  • /home/dina/CodeGen/data/test_dataset/log/5615_0_log.err
  • /home/dina/CodeGen/data/test_dataset/log/5615_0_log.out

Hum that never happened to me before.
Did you run /home/dina/CodeGen/codegen_sources/model/tools/fastBPE/fast learnbpe 50000 /home/dina/CodeGen/data/test_dataset/cpp-java-python.sa-cl.tok.shuf.50gb > /home/dina/CodeGen/data/test_dataset/cpp-java-python.sa-cl.codes to see more detailed logs ?

Thank you for your response.
The error occurred when I was using Linux as a virtual machine. Solved when we have used Linux as an operating system.

I bumped in exactly same problem, when was trying to run pipeline in GoogleColab, I guess the original folder structure was changed, and you can notice that in install_env.sh fastBPE is installed in codegen_sources/model/tools, so I just simply made

!cp -r codegen_sources/model/tools/fastBPE/ ./fastBPE

and it helped