the result of preprocess

Question

the result of preprocess

MelvinZang opened this issue 7 years ago · 18 comments

When I run preprocess.py in twenty_newsgroup, I get results like these

2 --> SKIP
4 , --> ÉÏ
5 . --> ÉÏ
13 - --> ÉÏ
15 ) --> ÉÏ
16 " --> ÉÏ
17 ( --> ÉÏ
19 : --> ÉÏ
24 ? --> ÉÏ
36 ' --> ÉÏ
43 / --> ÉÏ
49 ! --> ÉÏ
51 ; --> ÉÏ
61 < --> ÉÏ
76 ... --> §£.§£.
79 -- --> -4
90 ] --> ÉÏ
100 max>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax --> Malavika_Jagannathan_?_mjaganna@greenbaypressgazette.com
108 [ --> ÉÏ
126 | --> ÉÏ
226 } --> ÉÏ
231 10 --> -0

I don't know what should I do to fix it,or it is the right results.

Answer 1 · 2018-04-15T04:05:45.000Z

I use the results to run lda2vec_run.py

First I get this result:

Top words in topic 0 x11 sci.crypt pixels copyright pixel meg siggraph moncton phil rpm
Top words in topic 1 muslims steam christians communist filter playoffs terrorists indians filters macintosh
Top words in topic 2 nuclear revolver housing mike galley cabin ulf sf braking argic
Top words in topic 3 rbi reno ss canada bath apartment housing martin obey lindros
Top words in topic 4 mph pitchers hitter modems cubs braking telescope velocity blues brakes
Top words in topic 5 login dept militia customers 105 bombing abortion minorities workers americans
Top words in topic 6 ill tip puck jersey updates tips reply offensive archives guard
Top words in topic 7 sponsored rating inherently mode modes recommend voted p.o. p.m. participated
Top words in topic 8 0.333 manager logo subscribe stats secretary dec detector archives saves
Top words in topic 9 olwm dec gentiles azerbaijanis homosexuals liberal gays corps libertarians armenians
Top words in topic 10 firearm revolver knife atrocities bullock suicide accidents snow flyers handgun
Top words in topic 11 patents vs coverage v xv patent deals due warranty industry
Top words in topic 12 los nowhere shift distinguish gulf direction movement massacre channel slaughter
Top words in topic 13 microsoft msdos macintosh injection startup ken unix chinese cell pilot
Top words in topic 14 homicides madison dec iraq murders msdos wolverine refugees archive obfuscated
Top words in topic 15 edit login writers nejm moderator comics expressed msg copyright conclusions
Top words in topic 16 whalers syndrome gods note gotten orbiter subscription rf march cds
Top words in topic 17 chi subscribe noise digest ears section horn iron flow criteria
Top words in topic 18 pm p.m. p.o. ss deletion microsoft edm verse powerpc disable
Top words in topic 19 became ran rose grew stood pulled relations jumped fell remained
Traceback (most recent call last):
File "examples/twenty_newsgroups/lda2vec/lda2vec_run.py", line 107, in
optimizer.zero_grads()
AttributeError: 'Adam' object has no attribute 'zero_grads'

It is because my chainer version is 3.5.0 and the attribute 'zero_grads' is in the version below 2.0.0. The I change it to optimizer.use_cleargrads() (I'm not sure it is right or not). And then I get this

J:00561 E:00000 L:nan P:nan R:2.184e+04
J:00562 E:00000 L:nan P:nan R:1.826e+04
J:00563 E:00000 L:nan P:nan R:1.489e+04
Traceback (most recent call last):
File "examples/twenty_newsgroups/lda2vec/lda2vec_run.py", line 94, in
words)
File "/media/data/users/master/2018/zangmingzhe/lda2vec/lda2vec/topics.py", line 76, in prepare_topics
assert np.allclose(np.sum(topic_to_word, axis=1), 1), msg
AssertionError: Not all rows in topic_to_word sum to 1

Does anybody know where the problem is?

Answer 2 · 2018-05-04T20:43:44.000Z

@MelvinZang the problem is due to the chainer version. change otpimizer.use_cleargrads.

Answer 3 · 2018-05-06T10:23:13.000Z

@TamouzeAssi thanks, it is right.

Answer 4 · 2018-05-09T17:03:16.000Z

@MelvinZang I am also running into this assert error:

Traceback (most recent call last):
File "examples/hacker_news/lda2vec/lda2vec_run.py", line 87, in
words)
File "build/bdist.linux-x86_64/egg/lda2vec/topics.py", line 76, in prepare_topics
AssertionError: Not all rows in topic_to_word sum to 1

I also had to switch to use use_cleargrads() instead of zero_grads() due to chainer version.

Were you able to fix the assert error: AssertionError: Not all rows in topic_to_word sum to 1

Answer 5 · 2018-05-15T13:54:00.000Z

@anupamme replace use_cleargrads() with model.cleargrads()

Answer 6 · 2018-05-17T21:00:44.000Z

@MelvinZang Hi! I'm wondering how long did it take you to run preprocess.py and run.py? Thanks!

Answer 7 · 2018-05-22T08:34:56.000Z

@lovedatatiff It takes me nearly a whole day to run preprocess.py. But it takes only a few hours to run lda2vec_run.py with GPU

Answer 8 · 2018-05-22T08:41:46.000Z

@anupamme another simple way is to change chainer version to 1.9.0

Answer 9 · 2018-12-29T08:16:43.000Z

hello @MelvinZang
When I run lda2vec.py on my dataset, I get results like these

;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�ž&�»ÖLI»Íóõ»Yø�<�ª�Œà��<�ß9;|�È»ÀšÓ»l�ë:�Ss;ã�@ºIæÖº�u�»×��¹±Ÿ°»ã$1:�V"Œ.[£»dc Œj-!Œ¥K�ŒK��<€òîºdqý;�8x».è�»Ua,»p��º�m�»��Ç;þeÀ;¶¡E»ý¶E¹(*×»Û��º�ŠÊ:k±�ŒÏ £;H÷�º[Ä�;bbž:x:

plz tell,wats happening wrong here? i m stucked..
my dataset contains abstract.txt file (research papers abstracts data)

Answer 10 · 2018-12-30T03:25:30.000Z

I am also getting same error
AttributeError: 'Adam' object has no attribute 'zero_grads'

has anyone been able to resolve this lately ?

pip show chainer
Name: chainer
Version: 6.0.0b1
Summary: A flexible framework of neural networks
Home-page: https://chainer.org/

Edit : Solved by installing chainer==1.9.0

Answer 11 · 2019-01-02T02:21:51.000Z

@stalhaa The results I mentioned in the question are not wrong. It is a conversion process. Words that appear in articles the most are punctuations and the model changes them into something else. When the process continues, it goes normal. You can see plurals turns into singulars and other situation.

I don't understand the results you paste, maybe you can add format so that I can know when and why the model shows things like that.

Hope it helps.

Answer 12 · 2019-01-02T06:44:43.000Z

plz send me ur email address @MelvinZang .

Answer 13 · 2019-01-07T09:42:24.000Z

@MelvinZang . Did you get around the issue by installing chainer version 1.9.0 ?
Well, for me it does solve the issue on my mac. But I am trying to setup this on a colab notebook (for gpu support) and unable to install chainer 1.9.0 . *

/tmp/tmpQdaB_J/a.cpp:1:10: fatal error: cudnn.h: No such file or directory
#include <cudnn.h>
^~~~~~~~~
compilation terminated.
**************************************************
*** WARNING: Include files not found: ['cudnn.h']
*** WARNING: Skip installing cudnn support
*** WARNING: Check your CPATH environment variable
**************************************************
cython path:/usr/local/lib/python2.7/dist-packages
error: Command '/usr/bin/python2' failed:

  command: /usr/bin/python2 /usr/local/lib/python2.7/dist-packages/cython.py --fast-fail --verbose --cplus --directive profile=False --directive linetrace=False cupy/core/core.pyx
  return code: 1
  output:

Compiling /tmp/pip-build-tkCtKZ/chainer/cupy/core/core.pyx
/usr/local/lib/python2.7/dist-packages/Cython/Compiler/Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-build-tkCtKZ/chainer/cupy/core/core.pyx
  tree = Parsing.p_module(s, pxd, full_module_name)

Error compiling Cython file:
------------------------------------------------------------
...
    void* data
    int size
    int shape_and_strides[MAX_NDIM * 2]


cdef class CArray(cupy.cuda.function.CPointer):
                                   ^
------------------------------------------------------------

cupy/core/carray.pxi:14:36: First base of 'CArray' is not an extension type

####################
and with latest version of chainer i get this error.
AssertionError: Not all rows in topic_to_word sum to 1)

Really appreciate any insights here !

Answer 14 · 2019-01-16T18:42:32.000Z

@MelvinZang ??

Answer 15 · 2019-01-17T04:04:49.000Z

@stalhaa Sorry, I forgot. 752087739@qq.com

Answer 16 · 2019-01-21T12:10:46.000Z

@MelvinZang
can u please run your lda2vec.py file code by applying my dataset file instead of twenty_newsgroup and share its results later on.? will u plz do it for me ? I want top words from 100 topics. Kindly help me in this regard.thanks.

Answer 17 · 2019-01-21T12:12:07.000Z

@stalhaa let me have a try

Answer 18 · 2019-02-14T09:10:05.000Z

My problem is that everytime I install Chainer 1.9.0 in place of a later version, my code can't

import cupy.cudnn

and this cause the

UserWarning: cuDNN is not enabled.

But if I don't switch to 1.9.0 and use a latest version, the

AttributeError: 'Adam' object has no attribute 'zero_grads'

happens. If zero_grads is replaced with use_cleargrads(use=False), use_cleargrads(use=True), use_cleargrads(), or model.cleargrads(), any of them,

AssertionError: Not all rows in topic_to_word sum to 1

shows.