thu-coai/DA-Transformer

errors when executing script for generating the binarized data

Opened this issue · 7 comments

steps to reproduce the error
1,git clone --recurse-submodules https://github.com/thu-coai/DA-Transformer.git && pip install -e .
it didn't work well .I execute git clone --recurse-submodules https://github.com/thu-coai/DA-Transformer.git alone and then cd DA-Transformer,pip install -e . works fine
2,I tried to use the script in readme to generate binarized data

input_dir=path/to/raw_data        # directory of pre-processed text data
data_dir=path/to/binarized_data   # directory of the generated binarized data
src=src                           # source suffix
tgt=tgt                           # target suffix
fairseq-datpreprocess --source-lang ${src} --target-lang ${tgt} \
    --trainpref ${input_dir}/train --validpref ${input_dir}/valid --testpref ${input_dir}/test \
    --src-dict ${input_dir}/dict.${src}.txt --tgt-dict {input_dir}/dict.${tgt}.txt \
    --destdir ${data_dir} --workers 32 \
    --user-dir fs_plugins --task translation_dat_task [--seg-tokens 32]

# seg-tokens should be set to 32 when you use pre-trained models.

image
I don't know what's going wrong. Plz help me

For the first problem, you are right. I will fix the script in README.
For the second probelm, I don't know the exact reason but guess that the problem may be caused by the corrupted environment. You can try pip uninstall fairseq, run some other python programs to make sure the envoronment is working well, and then pip install -e . in DA-Transformer. Or simply, create a new environment and re-install all the packages.

@hzhwcmhf Thx for your reply, I followed your first suggestion,
1pip uninstall fairseq
image
2,run some other python program to verify the environment,
image
3,pip install -e .
image

run the sript again ,still give the same error.

For the second suggestion,I don't quite understand,so just simply create a conda environment ,and then install fairseq using pip install fairseq
instead of pip install -e .
?

Create a conda environment, install pytorch (check if it is working well), clone this repo, pip install -e ..
Then, run fairseq-datpreprocess without arguments to see if there is any error.

@hzhwcmhf still get the same error
UR TP76LD{FMVA4{5)J{ ~Q

I am not sure where the problem is. The error message is in a module named pkg_resources, which is not used by this project. As far as I know, it indicates the package manager is corrupted (such as pip and conda). Or maybe you have multiple python installations and do not correctly set the environment variables.

You can try installing the original fairseq (see https://github.com/facebookresearch/fairseq#requirements-and-installation) first, and run fairseq-preprocess to see if there is any error. If the error still exists, maybe you should check the installation of python or conda.

@hzhwcmhf I installed original fairseq, fairseq-preprocess works fine.
I rechecked the error message ,found the error orginated form import dag_loss
image

from torch.utils.cpp_extension import load

maybe something wrong with gcc?

If you uninstall fairseq or this project, will from torch.utils.cpp_extension import load produce an error?
I still think that the problem is not caused by this project. Maybe there is a dependency file of an installed package containing PyYAML (>=5.1.*), which is not correctly parsed by pkg_resources.