ko-nlp/Korpora

KoWikiText LM data 생성 이슈

Beomi opened this issue · 1 comments

Beomi commented

env

  • korpora == 0.2.0
  • python ~= 3.8

Issue

command

아래 커맨드 실행시 에러 발생

korpora lmdata \
  --corpus all \
  --output_dir ~/works/lmdata

Error log

Create train data from kowikitext: 0it [00:00, ?it/s]

| Done | Corpus name               | Num sents  | File name |
| ---- | ------------------------- | ---------- | --------- |
|  x   | kcbert                    |   86246284 | all.train |
|  x   | korean_chatbot_data       |      23646 | all.train |
|  x   | korean_hate_speech        |    2042260 | all.train |
|  x   | korean_parallel_koen_news |      97123 | all.train |
|  x   | korean_petitions          |     867262 | all.train |
|  x   | kornli                    |    1900708 | all.train |
|  x   | korsts                    |      17256 | all.train |
|      | kowikitext                |  -         |           |
|      | namuwikitext              |  -         |           |
|      | naver_changwon_ner        |  -         |           |
|      | nsmc                      |  -         |           |
|      | question_pair             |  -         |           |
[Korpora] Corpus `kowikitext` is already installed at /home/beomi/Korpora/kowikitext/kowikitext_20200920.train.zip
[Korpora] Corpus `kowikitext` is already installed at /home/beomi/Korpora/kowikitext/kowikitext_20200920.train
[Korpora] Corpus `kowikitext` is already installed at /home/beomi/Korpora/kowikitext/kowikitext_20200920.test.zip
[Korpora] Corpus `kowikitext` is already installed at /home/beomi/Korpora/kowikitext/kowikitext_20200920.test
[Korpora] Corpus `kowikitext` is already installed at /home/beomi/Korpora/kowikitext/kowikitext_20200920.dev.zip
[Korpora] Corpus `kowikitext` is already installed at /home/beomi/Korpora/kowikitext/kowikitext_20200920.dev
Create train data from kowikitext: 0it [00:02, ?it/s]
Traceback (most recent call last):
  File "/home/beomi/anaconda3/envs/deepspeed/bin/korpora", line 8, in <module>
    sys.exit(main())
  File "/home/beomi/anaconda3/envs/deepspeed/lib/python3.8/site-packages/Korpora/cli.py", line 64, in main
    task_function(args)
  File "/home/beomi/anaconda3/envs/deepspeed/lib/python3.8/site-packages/Korpora/task_lmdata.py", line 47, in create_lmdata
    for i_sent, sent in enumerate(sent_iterator):
  File "/home/beomi/anaconda3/envs/deepspeed/lib/python3.8/site-packages/tqdm/std.py", line 1133, in __iter__
    for obj in iterable:
  File "/home/beomi/anaconda3/envs/deepspeed/lib/python3.8/site-packages/Korpora/task_lmdata.py", line 180, in iterate_kowikitext
    with open(path, encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/beomi/Korpora//kowiki/kowikitext_20200920.train'

ko wiki의 경우 kowikitext/kowikitext_.....으로 되어있어야 하는데, LM data 부분에서는 /kowiki/kowikitext_....으로 오타가 있는 듯 합니다.

Beomi commented

#187 이슈에서 이미 체크된 부분인 것 같습니다. 이슈는 0.3.0 릴리즈하시고 닫아주세요.