attardi/deepnl

dl-sentiwords throws error "IndexError: index 1 is out of bounds for axis 0 with size 1"

AzizCode92 opened this issue · 6 comments

Hi all ,
When I use dl-sentiwords.py trained1.tsv --vocab words.txt --vectors vectors.txt I got this error

Saving vocabulary in words.txt
Creating new network...
... with the following parameters:

    Input layer size: 550
    Hidden layer size: 200
    Output size: 2
    Starting training

Traceback (most recent call last):
File "/usr/local/bin/dl-sentiwords.py", line 4, in
import('pkg_resources').run_script('deepnl==1.3.18', 'dl-sentiwords.py')
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 742, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 1510, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python2.7/dist-packages/deepnl-1.3.18-py2.7-linux-x86_64.egg/EGG-INFO/scripts/dl-sentiwords.py", line 218, in

File "deepnl/sentiwords.pyx", line 53, in itertrie
File "deepnl/sentiwords.pyx", line 126, in deepnl.sentiwords.SentimentTrainer._train_pair_s
File "deepnl/extractors.pyx", line 153, in deepnl.extractors.Converter.lookup
File "deepnl/extractors.pyx", line 236, in deepnl.extractors.Extractor.getitem
IndexError: index 1 is out of bounds for axis 0 with size 1

trained1.tsv is a file with the follwing format :

  <SID><tab><UID><tab><positive|negative|neutral|objective><tab><TWITTER_MESSAGE>

I have obtained the tsv file by transforming a huge dataset of tweets into a tsv and by making some transformation to the columns so that it suits the format mentioned above.
For further details about my code here is the https://github.com/AzizCode92/text_mining_project/blob/master/csv_tsv.py

The issue is solved because I was working on a very huge TSV file , so I have tried to split it into parts and then the error is solved.

Hi!
Sorry, I have seen you have recently used this library..

I am training the Sentiment Specific embedding. At the end of each epoch, I have got a message like this:

23 epochs Examples: 7818897 Error: 326588146461.176880 Accuracy: 0.000000 23589 corrections skipped

The accuracy remains always zero, no matter the number of epochs.
Is it ok? Did you get the same accuracy?

Thank you! :)

Hi !
I have seen Mr.Attardi's comment about the meaning of both accuracy and errors and he said

Don't worry about those numbers.
You shoud get useable embeddings anyway.

Thank you!!
I have just found out the comment you have referred: #32

@AzizCode92
Got the same issue, i'm using a big file, sorry but the dl-sentiwords.py need one file (according to the example) how did you manage to input several files ?
Thanks.

Halo! Have you successfully installed deepnl? Where do you install it? windows, linux or mac?
i have some problems in windows? Can you help me?