Jekub/Wapiti

Got out of memory error while working with large file

Opened this issue · 1 comments

Hi Team,

When I run wapiti CRF on 36k training data with following command, return

"out of memory error, train model with L-BFGS. "

wapiti train -p ../template_7feats -1 5 --nthread 5 ../train_feats.txt 36kmodel_wapiti

Thanks,
Somnath A. Kadam

@SomnathKadam

i got a same issue when i use 'bigram' features for large training data.
memory usage went exploding up to 100G;
this is not for 'unigram' features.

  • crf.pattern
b

u:wrd LL=%X[-2,0]
u:tag LL=%X[-2,1]

u:wrd L=%X[-1,0]
u:tag L=%X[-1,1]

*:wrd X=%X[0,0]
*:tag X=%X[0,1]

u:wrd R=%X[1,0]
u:tag R=%X[1,1]

u:wrd RR=%X[2,0]
u:tag RR=%X[2,1]
  • train
$ wapiti -t 16 -c -p crf.pattern train.txt crf.model

however, when i modified the crf.pattern
(use only 'b' transition), things goes fine

  • crf.pattern
#unigram
u:wrd LL=%X[-2,0]
u:tag LL=%X[-2,1]

u:wrd L=%X[-1,0]
u:tag L=%X[-1,1]

u:wrd X=%X[0,0]
u:tag X=%X[0,1]

u:wrd R=%X[1,0]
u:tag R=%X[1,1]

u:wrd RR=%X[2,0]
u:tag RR=%X[2,1]

#bigram
b
  • train
$ wapiti -t 16 -c -p crf.pattern train.txt crf.model
....
  [   3] obj=1897392.82 act=989401   err=45.81%/99.34% time=4645.94s/11109.04s
  [   4] obj=1864936.55 act=1397073  err=45.81%/99.34% time=5211.76s/16320.80s
  [   5] obj=1862659.23 act=978958   err=45.81%/99.34% time=3486.53s/19807.33s
* Compacting the model
    - Scan the model
    - Compact it
        1278 observations removed
      886932 features removed
* Save the model
* Done

but as you can see, training failed at 5 iterations.
( without bigram feature, iterations continues to 60 and err is 1.8% )

i found there was a similar issue. those setting solved the problem.