KoWikiText LM data 생성 이슈
Beomi opened this issue · 1 comments
Beomi commented
env
- korpora == 0.2.0
- python ~= 3.8
Issue
command
아래 커맨드 실행시 에러 발생
korpora lmdata \
--corpus all \
--output_dir ~/works/lmdata
Error log
Create train data from kowikitext: 0it [00:00, ?it/s]
| Done | Corpus name | Num sents | File name |
| ---- | ------------------------- | ---------- | --------- |
| x | kcbert | 86246284 | all.train |
| x | korean_chatbot_data | 23646 | all.train |
| x | korean_hate_speech | 2042260 | all.train |
| x | korean_parallel_koen_news | 97123 | all.train |
| x | korean_petitions | 867262 | all.train |
| x | kornli | 1900708 | all.train |
| x | korsts | 17256 | all.train |
| | kowikitext | - | |
| | namuwikitext | - | |
| | naver_changwon_ner | - | |
| | nsmc | - | |
| | question_pair | - | |
[Korpora] Corpus `kowikitext` is already installed at /home/beomi/Korpora/kowikitext/kowikitext_20200920.train.zip
[Korpora] Corpus `kowikitext` is already installed at /home/beomi/Korpora/kowikitext/kowikitext_20200920.train
[Korpora] Corpus `kowikitext` is already installed at /home/beomi/Korpora/kowikitext/kowikitext_20200920.test.zip
[Korpora] Corpus `kowikitext` is already installed at /home/beomi/Korpora/kowikitext/kowikitext_20200920.test
[Korpora] Corpus `kowikitext` is already installed at /home/beomi/Korpora/kowikitext/kowikitext_20200920.dev.zip
[Korpora] Corpus `kowikitext` is already installed at /home/beomi/Korpora/kowikitext/kowikitext_20200920.dev
Create train data from kowikitext: 0it [00:02, ?it/s]
Traceback (most recent call last):
File "/home/beomi/anaconda3/envs/deepspeed/bin/korpora", line 8, in <module>
sys.exit(main())
File "/home/beomi/anaconda3/envs/deepspeed/lib/python3.8/site-packages/Korpora/cli.py", line 64, in main
task_function(args)
File "/home/beomi/anaconda3/envs/deepspeed/lib/python3.8/site-packages/Korpora/task_lmdata.py", line 47, in create_lmdata
for i_sent, sent in enumerate(sent_iterator):
File "/home/beomi/anaconda3/envs/deepspeed/lib/python3.8/site-packages/tqdm/std.py", line 1133, in __iter__
for obj in iterable:
File "/home/beomi/anaconda3/envs/deepspeed/lib/python3.8/site-packages/Korpora/task_lmdata.py", line 180, in iterate_kowikitext
with open(path, encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/beomi/Korpora//kowiki/kowikitext_20200920.train'
ko wiki의 경우 kowikitext/kowikitext_.....
으로 되어있어야 하는데, LM data 부분에서는 /kowiki/kowikitext_....
으로 오타가 있는 듯 합니다.