taku910/mecab

[mecab-dict-index] error

coleea opened this issue · 2 comments

Hi.
When I run 'mecab-dict-index', error occured.
log information is like this.

==============================================================================

reading ./ETN.csv ... 14
reading ./LISTEN_NER.csv ... 2081
reading ./Preanalysis.csv ... 5
reading ./TV_fullKorean_dict.csv ... 1687814
reading ./NP.csv ... 342
reading ./EF.csv ... 1820
reading ./XSA.csv ... 20
reading ./MM.csv ... 453
reading ./keyword.csv ... 276
reading ./XPN.csv ... 83
reading ./unk_word 1 1 0 (2nd).csv ... 276
reading ./Inflect.csv ... 44850
reading ./VA.csv ... 2360
reading ./XSV.csv ... 24
reading ./keyword_etc.csv ... 222
reading ./Place.csv ... 30300
reading ./LISTEN_unk_word 1 1 9.csv ... 254
reading ./LISTEN_KEYWORD.csv ... 2
reading ./sejong21_word.csv ... 846637
reading ./NNP.csv ... 2371
reading ./Hanja.csv ... 124570
reading ./EP.csv ... 51
reading ./KOR_ENG_csv.csv ... 60365
reading ./sejong21_verbal2.csv ... 15160
reading ./Foreign.csv ... 11599
reading ./NR.csv ... 482
reading ./NNB.csv ... 140
reading ./LISTEN_unk_word.csv ... 254
reading ./Wikipedia.csv ... 36763
reading ./sejong21_fusion.csv ... 1321382
reading ./VCN.csv ... 7
reading ./NNG.csv ... 205269
reading ./MAG.csv ... 14244
reading ./Person-actor.csv ... 99237
reading ./Symbol.csv ... 16
reading ./VCP.csv ... 9
reading ./VX.csv ... 125
reading ./Person.csv ... 196461
reading ./Group.csv ... 3176
reading ./XSN.csv ... 124
reading ./ETM.csv ... 133
reading ./NorthKorea.csv ... 3
dictionary.cpp(472) [da.build(str.size(), const_cast<char **>(&str[0]), &len[0], &val[0], &progress_bar_darts) == 0] unkown error in building double-array

==============================================================================

[dictionary.cpp] line 472~476 is like this

for (size_t i = 0; i < dic.size(); ++i) {
  | tbuf.append(reinterpret_cast<const char*>(dic[i].second),
  | sizeof(Token));
  | delete dic[i].second;
  | }
 

==============================================================================

this error occured when I add 'TV_fullKorean_dict.csv' that contains 1,687,814 entry data.
file size is 165.8MB.
Is there any limit of csv file size ?

Thank you

I have exactly the same problem, when I tried to increase the size of the dictionary. A reply would be appreciated.

I have exactly the same problem. It runs when I separate the dictionary but it fails when I try to apply it to the original file.