the result of preprocess
MelvinZang opened this issue · 18 comments
When I run preprocess.py in twenty_newsgroup, I get results like these
2 --> SKIP
4 , --> ÉÏ
5 . --> ÉÏ
13 - --> ÉÏ
15 ) --> ÉÏ
16 " --> ÉÏ
17 ( --> ÉÏ
19 : --> ÉÏ
24 ? --> ÉÏ
36 ' --> ÉÏ
43 / --> ÉÏ
49 ! --> ÉÏ
51 ; --> ÉÏ
61 < --> ÉÏ
76 ... --> §£.§£.
79 -- --> -4
90 ] --> ÉÏ
100 max>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax --> Malavika_Jagannathan_?_mjaganna@greenbaypressgazette.com
108 [ --> ÉÏ
126 | --> ÉÏ
226 } --> ÉÏ
231 10 --> -0
I don't know what should I do to fix it,or it is the right results.
I use the results to run lda2vec_run.py
First I get this result:
Top words in topic 0 x11 sci.crypt pixels copyright pixel meg siggraph moncton phil rpm
Top words in topic 1 muslims steam christians communist filter playoffs terrorists indians filters macintosh
Top words in topic 2 nuclear revolver housing mike galley cabin ulf sf braking argic
Top words in topic 3 rbi reno ss canada bath apartment housing martin obey lindros
Top words in topic 4 mph pitchers hitter modems cubs braking telescope velocity blues brakes
Top words in topic 5 login dept militia customers 105 bombing abortion minorities workers americans
Top words in topic 6 ill tip puck jersey updates tips reply offensive archives guard
Top words in topic 7 sponsored rating inherently mode modes recommend voted p.o. p.m. participated
Top words in topic 8 0.333 manager logo subscribe stats secretary dec detector archives saves
Top words in topic 9 olwm dec gentiles azerbaijanis homosexuals liberal gays corps libertarians armenians
Top words in topic 10 firearm revolver knife atrocities bullock suicide accidents snow flyers handgun
Top words in topic 11 patents vs coverage v xv patent deals due warranty industry
Top words in topic 12 los nowhere shift distinguish gulf direction movement massacre channel slaughter
Top words in topic 13 microsoft msdos macintosh injection startup ken unix chinese cell pilot
Top words in topic 14 homicides madison dec iraq murders msdos wolverine refugees archive obfuscated
Top words in topic 15 edit login writers nejm moderator comics expressed msg copyright conclusions
Top words in topic 16 whalers syndrome gods note gotten orbiter subscription rf march cds
Top words in topic 17 chi subscribe noise digest ears section horn iron flow criteria
Top words in topic 18 pm p.m. p.o. ss deletion microsoft edm verse powerpc disable
Top words in topic 19 became ran rose grew stood pulled relations jumped fell remained
Traceback (most recent call last):
File "examples/twenty_newsgroups/lda2vec/lda2vec_run.py", line 107, in
optimizer.zero_grads()
AttributeError: 'Adam' object has no attribute 'zero_grads'
It is because my chainer version is 3.5.0 and the attribute 'zero_grads' is in the version below 2.0.0. The I change it to optimizer.use_cleargrads() (I'm not sure it is right or not). And then I get this
J:00561 E:00000 L:nan P:nan R:2.184e+04
J:00562 E:00000 L:nan P:nan R:1.826e+04
J:00563 E:00000 L:nan P:nan R:1.489e+04
Traceback (most recent call last):
File "examples/twenty_newsgroups/lda2vec/lda2vec_run.py", line 94, in
words)
File "/media/data/users/master/2018/zangmingzhe/lda2vec/lda2vec/topics.py", line 76, in prepare_topics
assert np.allclose(np.sum(topic_to_word, axis=1), 1), msg
AssertionError: Not all rows in topic_to_word sum to 1
Does anybody know where the problem is?
@MelvinZang the problem is due to the chainer version. change otpimizer.use_cleargrads.
@TamouzeAssi thanks, it is right.
@MelvinZang I am also running into this assert error:
Traceback (most recent call last):
File "examples/hacker_news/lda2vec/lda2vec_run.py", line 87, in
words)
File "build/bdist.linux-x86_64/egg/lda2vec/topics.py", line 76, in prepare_topics
AssertionError: Not all rows in topic_to_word sum to 1
I also had to switch to use use_cleargrads() instead of zero_grads() due to chainer version.
Were you able to fix the assert error: AssertionError: Not all rows in topic_to_word sum to 1
@MelvinZang Hi! I'm wondering how long did it take you to run preprocess.py and run.py? Thanks!
@lovedatatiff It takes me nearly a whole day to run preprocess.py. But it takes only a few hours to run lda2vec_run.py with GPU
@anupamme another simple way is to change chainer version to 1.9.0
hello @MelvinZang
When I run lda2vec.py on my dataset, I get results like these
;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�;²õ�ž&�»ÖLI»Íóõ»Yø�<�ª�Œà��<�ß9;|�È»ÀšÓ»l�ë:�Ss;ã�@ºIæÖº�u�»×��¹±Ÿ°»ã$1:�V"Œ.[£»dc Œj-!Œ¥K�ŒK��<€òîºdqý;�8x».è�»Ua,»p��º�m�»��Ç;þeÀ;¶¡E»ý¶E¹(*×»Û��º�ŠÊ:k±�ŒÏ £;H÷�º[Ä�;bbž:x:
plz tell,wats happening wrong here? i m stucked..
my dataset contains abstract.txt file (research papers abstracts data)
I am also getting same error
AttributeError: 'Adam' object has no attribute 'zero_grads'
has anyone been able to resolve this lately ?
pip show chainer
Name: chainer
Version: 6.0.0b1
Summary: A flexible framework of neural networks
Home-page: https://chainer.org/
Edit : Solved by installing chainer==1.9.0
@stalhaa The results I mentioned in the question are not wrong. It is a conversion process. Words that appear in articles the most are punctuations and the model changes them into something else. When the process continues, it goes normal. You can see plurals turns into singulars and other situation.
I don't understand the results you paste, maybe you can add format so that I can know when and why the model shows things like that.
Hope it helps.
plz send me ur email address @MelvinZang .
@MelvinZang . Did you get around the issue by installing chainer version 1.9.0 ?
Well, for me it does solve the issue on my mac. But I am trying to setup this on a colab notebook (for gpu support) and unable to install chainer 1.9.0 . *
/tmp/tmpQdaB_J/a.cpp:1:10: fatal error: cudnn.h: No such file or directory
#include <cudnn.h>
^~~~~~~~~
compilation terminated.
**************************************************
*** WARNING: Include files not found: ['cudnn.h']
*** WARNING: Skip installing cudnn support
*** WARNING: Check your CPATH environment variable
**************************************************
cython path:/usr/local/lib/python2.7/dist-packages
error: Command '/usr/bin/python2' failed:
command: /usr/bin/python2 /usr/local/lib/python2.7/dist-packages/cython.py --fast-fail --verbose --cplus --directive profile=False --directive linetrace=False cupy/core/core.pyx
return code: 1
output:
Compiling /tmp/pip-build-tkCtKZ/chainer/cupy/core/core.pyx
/usr/local/lib/python2.7/dist-packages/Cython/Compiler/Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-build-tkCtKZ/chainer/cupy/core/core.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
Error compiling Cython file:
------------------------------------------------------------
...
void* data
int size
int shape_and_strides[MAX_NDIM * 2]
cdef class CArray(cupy.cuda.function.CPointer):
^
------------------------------------------------------------
cupy/core/carray.pxi:14:36: First base of 'CArray' is not an extension type
####################
and with latest version of chainer i get this error.
AssertionError: Not all rows in topic_to_word sum to 1)
Really appreciate any insights here !
@MelvinZang ??
@stalhaa Sorry, I forgot. 752087739@qq.com
@MelvinZang
can u please run your lda2vec.py file code by applying my dataset file instead of twenty_newsgroup and share its results later on.? will u plz do it for me ? I want top words from 100 topics. Kindly help me in this regard.thanks.
@stalhaa let me have a try
My problem is that everytime I install Chainer 1.9.0 in place of a later version, my code can't
import cupy.cudnn
and this cause the
UserWarning: cuDNN is not enabled.
But if I don't switch to 1.9.0 and use a latest version, the
AttributeError: 'Adam' object has no attribute 'zero_grads'
happens. If zero_grads is replaced with use_cleargrads(use=False)
, use_cleargrads(use=True)
, use_cleargrads()
, or model.cleargrads()
, any of them,
AssertionError: Not all rows in topic_to_word sum to 1
shows.