UncompletedJobError: No output/error stream produced
sushantkumar007007 opened this issue · 1 comments
I am running the CodeGen using the test repository (https://github.com/facebookresearch/CodeGen/tree/main/data/test_dataset) for obfuscation mode
run codegen_sources/preprocessing/preprocess.py data/python_test --mode obfuscation --local True --local_parallelism 4 --langs python --train_splits 1 --tokenization_timeout 400 --bpe_timeout 220 --train_bpe_timeout 400 --bpe_mode fast --fastbpe_use_vocab True --fastbpe_vocab_path data/bpe/cpp-java-python/vocab --fastbpe_code_path data/bpe/cpp-java-python/codes --keep_comments False --ncodes 4000 --percent_test_valid 2
I am getting the following error,
`INFO - 05/04/22 15:56:33 - 0:00:00 - Dataset pipeline for /home/sushantk/anaconda3/codeGen/data/python_test
INFO - 05/04/22 15:56:33 - 0:00:00 - ========== Extract and Tokenize ===========
INFO - 05/04/22 15:56:33 - 0:00:00 - Using 4 processors.
INFO - 05/04/22 15:56:33 - 0:00:00 - python: tokenizing and extracting parallel functions in 1 json files ...
INFO - 05/04/22 15:56:33 - 0:00:00 - Number of lines to process: 50
WARNING - 05/04/22 15:56:33 - 0:00:01 - Error obfuscating content Missing parentheses in call to 'print'. Did you mean print('\nThe best BASE85 based alphabet for your setup is: %s' \)? (<unknown>, line 1673)
WARNING - 05/04/22 15:56:33 - 0:00:01 - Error obfuscating content local variable 'mangledName' referenced before assignment
WARNING - 05/04/22 15:56:33 - 0:00:01 - Error obfuscating content local variable 'mangledName' referenced before assignment
WARNING - 05/04/22 15:56:33 - 0:00:01 - Error obfuscating content Missing parentheses in call to 'print'. Did you mean print("Press control+C to stop and show the summary")? (<unknown>, line 43)
WARNING - 05/04/22 15:56:33 - 0:00:01 - Error obfuscating content local variable 'mangledName' referenced before assignment
WARNING - 05/04/22 15:56:33 - 0:00:01 - Error obfuscating content local variable 'mangledName' referenced before assignment
WARNING - 05/04/22 15:56:34 - 0:00:01 - Error obfuscating content Missing parentheses in call to 'print'. Did you mean print("permantly remove file ", file)? (<unknown>, line 374)
WARNING - 05/04/22 15:56:34 - 0:00:01 - Error obfuscating content local variable 'mangledName' referenced before assignment
WARNING - 05/04/22 15:56:34 - 0:00:01 - Error obfuscating content invalid syntax (<unknown>, line 426)
WARNING - 05/04/22 15:56:34 - 0:00:01 - Error obfuscating content Missing parentheses in call to 'print'. Did you mean print("\nBEGIN - expecting GEOS_ERROR)? (<unknown>, line 135)
WARNING - 05/04/22 15:56:34 - 0:00:01 - Error obfuscating content invalid syntax (<unknown>, line 92)
WARNING - 05/04/22 15:56:34 - 0:00:01 - Error obfuscating content invalid syntax (<unknown>, line 62)
100%|██████████| 50/50 [00:00<00:00, 3385.62it/s]
INFO - 05/04/22 15:56:34 - 0:00:01 - Time elapsed: 0.95
WARNING - 05/04/22 15:56:34 - 0:00:01 - Tokenization of /home/sushantk/anaconda3/codeGen/data/python_test/python.001 (1).json.gz:12 errors out of 50 lines(24.00%)
WARNING - 05/04/22 15:56:34 - 0:00:01 - Tokenization of /home/sushantk/anaconda3/codeGen/data/python_test/python.001 (1).json.gz:3 filtered examples in 50 lines(6.00%)
INFO - 05/04/22 15:56:34 - 0:00:01 - ========== Deduplicate and Split ===========
INFO - 05/04/22 15:56:34 - 0:00:02 - all files python.*[0-9].obfuscated.tok regrouped in /home/sushantk/anaconda3/codeGen/data/python_test/python.all.obfuscated.tok .
INFO - 05/04/22 15:56:34 - 0:00:02 - all files python.*[0-9].dictionary.tok regrouped in /home/sushantk/anaconda3/codeGen/data/python_test/python.all.dictionary.tok .
INFO - 05/04/22 15:56:34 - 0:00:02 - shuffling 2 files parallely: python.all.obfuscated.tok, python.all.dictionary.tok
INFO - 05/04/22 15:56:34 - 0:00:02 - python: Deduplication on 'obfuscated' and propagated on other suffixes.
INFO - 05/04/22 15:56:34 - 0:00:02 - python: Duplicated lines for obfuscated: 0 / 35
INFO - 05/04/22 15:56:34 - 0:00:02 - python: valid.obfuscated -> 0 lines
INFO - 05/04/22 15:56:35 - 0:00:02 - python: test.obfuscated -> 0 lines
INFO - 05/04/22 15:56:35 - 0:00:02 - python: train.obfuscated.0 -> 35 lines
INFO - 05/04/22 15:56:35 - 0:00:02 - python: Duplicated lines for dictionary: 0 / 35
INFO - 05/04/22 15:56:35 - 0:00:02 - python: valid.dictionary -> 0 lines
INFO - 05/04/22 15:56:35 - 0:00:02 - python: test.dictionary -> 0 lines
INFO - 05/04/22 15:56:35 - 0:00:02 - python: train.dictionary.0 -> 35 lines
INFO - 05/04/22 15:56:35 - 0:00:02 - Sucessfully regroup, deduplicate and split tokenized data into a train/valid/test sets.
INFO - 05/04/22 15:56:35 - 0:00:02 - ========== Learn BPE ===========
INFO - 05/04/22 15:56:35 - 0:00:02 - No need to train bpe codes, already trained. Codes: data/bpe/cpp-java-python/codes
INFO - 05/04/22 15:56:35 - 0:00:02 - ========== Apply BPE ===========
INFO - 05/04/22 15:56:35 - 0:00:02 - Applying BPE on /home/sushantk/anaconda3/codeGen/data/python_test/python.train.dictionary.0.tok ...
INFO - 05/04/22 15:56:35 - 0:00:02 - Applying BPE on /home/sushantk/anaconda3/codeGen/data/python_test/python.train.obfuscated.0.tok ...
WARNING - 05/04/22 15:56:35 - 0:00:02 - /home/sushantk/anaconda3/codeGen/data/python_test/python.valid.dictionary.tok is not a valid file, cannot to apply BPE on it.
WARNING - 05/04/22 15:56:35 - 0:00:02 - /home/sushantk/anaconda3/codeGen/data/python_test/python.valid.obfuscated.tok is not a valid file, cannot to apply BPE on it.
WARNING - 05/04/22 15:56:35 - 0:00:02 - /home/sushantk/anaconda3/codeGen/data/python_test/python.test.dictionary.tok is not a valid file, cannot to apply BPE on it.
WARNING - 05/04/22 15:56:35 - 0:00:02 - /home/sushantk/anaconda3/codeGen/data/python_test/python.test.obfuscated.tok is not a valid file, cannot to apply BPE on it.
---------------------------------------------------------------------------
UncompletedJobError Traceback (most recent call last)
~/anaconda3/codeGen/codegen_sources/preprocessing/preprocess.py in <module>()
212 args.input_path = os.path.abspath(args.input_path)
213 multiprocessing.set_start_method("fork")
--> 214 preprocess(args)
~/anaconda3/codeGen/codegen_sources/preprocessing/preprocess.py in preprocess(args)
103
104 dataset.apply_bpe(
--> 105 executor=cluster_apply_bpe, local_parallelism=args.local_parallelism
106 )
107 dataset.get_vocab(executor=cluster_train_bpe)
~/anaconda3/codeGen/codegen_sources/preprocessing/dataset_modes/obfuscation_mode.py in apply_bpe(self, executor, local_parallelism)
127 _bpe_ext = self.bpe.ext
128 self.bpe.ext += TMP_EXT
--> 129 super().apply_bpe(executor)
130 self.bpe.ext = _bpe_ext
131 # restore BPE on obfuscation special tokens
~/anaconda3/codeGen/codegen_sources/preprocessing/dataset_modes/dataset_mode.py in apply_bpe(self, executor, local_parallelism)
615 jobs.append(job)
616 for job in jobs:
--> 617 job.result()
618 logger.info("BPE done.")
619 # logger.info("Regrouping BPE")
~/anaconda3/envs/codeGen_env/lib/python3.6/site-packages/submitit/core/core.py in result(self)
264
265 def result(self) -> R:
--> 266 r = self.results()
267 assert not self._sub_jobs, "You should use `results()` if your job has subtasks."
268 return r[0]
~/anaconda3/envs/codeGen_env/lib/python3.6/site-packages/submitit/core/core.py in results(self)
287 return [tp.cast(R, sub_job.result()) for sub_job in self._sub_jobs]
288
--> 289 outcome, result = self._get_outcome_and_result()
290 if outcome == "error":
291 job_exception = self.exception()
~/anaconda3/envs/codeGen_env/lib/python3.6/site-packages/submitit/core/core.py in _get_outcome_and_result(self)
382 else:
383 message.append(f"No output/error stream produced ! Check: {self.paths.stdout}")
--> 384 raise utils.UncompletedJobError("\n".join(message))
385 try:
386 output: tp.Tuple[str, tp.Any] = utils.pickle_load(self.paths.result_pickle)
UncompletedJobError: Job 18686 (task: 0) with path /home/sushantk/anaconda3/codeGen/data/python_test/log/18686_0_result.pkl
has not produced any output (state: FINISHED)
No output/error stream produced ! Check: /home/sushantk/anaconda3/codeGen/data/python_test/log/18686_0_log.out`
After opening the "python.test.dictionary.tok" "python.test.obfuscated.tok", "python.valid.dictionary.tok" "python.valid.obfuscated.tok" are blank, they are not producing anything.
Can you tell why this is happening??
Hi,
It may be because all 35 examples in the python file you kept are sent to the training set.
Maybe train running it on the 3 python files in the test dataset (it should still be quite fast) or increase --percent_test_valid
to something like 10 or 20.