nlpnet giving error when loading own embedding in english language

Question

nlpnet giving error when loading own embedding in english language

sharadlnx opened this issue 8 years ago · 8 comments

When loading own embedding using nlpnet-load-embeddings.py, the data get loaded but when i call nlpnet-train.py, i get the following error:

sharad@sharad-Precision-WorkStation-T3500:~/Desktop/Experiments/AE_CNN$ nlpnet-train.py pos --gold test_data.txt --load-features
Reading training data...
Loading vocabulary
Creating new network...
Loading word type features...
Number of types in feature table and dictionary differ.
Generating features for 3 new types.
Traceback (most recent call last):
File "/usr/local/bin/nlpnet-train.py", line 251, in
feature_tables = utils.create_feature_tables(args, md, text_reader)
File "/usr/local/lib/python2.7/dist-packages/nlpnet/utils.py", line 274, in create_feature_tables
num_features = len(types_table[0])
TypeError: object of type 'numpy.float64' has no len()

Thanks

Answer 1 · 2016-10-31T11:32:00.000Z

The error message indicates that that a float was found where an array was expected. This would happen if the embeddings table is actually a single vector. Are you sure your embeddings file is correct?

Answer 2 · 2016-11-01T05:10:27.000Z

hello Eric
I used trained embedding which is of 300 dimension. It looks like:
nycsamsung 0.0170073 -0.0109652 0.0344387 -0.003403 0.0016694 -0.0167248 -0.0093864 -0.0141794 0.00638278 0.000625751 -0.00270205 0.0108427 0.00708099 -0.00805819 0.0128769 -0.00431037 0.0234483 -0.0162945 0.0322706 -0.0222843 -0.0120868 0.00623513 0.010453 0.00166812 -0.00860715 -0.0155808 -0.00141817 0.0072703 -0.0150907 -0.0123069 -0.0182448 0.00487408 0.00120165 -0.00754455 -0.0111749 -0.00210027 -0.0047199 -0.0198391 0.00127984 -0.000257241 0.0089696 -0.00683688 -0.0153426 0.000624516 -0.00449739 -0.0246503 0.000625939 0.0149265 -0.00672496 0.0184106 0.0108315 -0.00643058 -0.00537051 0.0072249 0.0362979 0.0090213 0.0128124 -0.0116347 0.0133644 0.0118503 0.0380131 -0.0105276 -0.0115741 -0.015157 0.0272652 0.0150304 0.0240434 0.00126905 0.0138081 0.00558343 0.0121898 0.00237619 0.00577404 0.0259883 0.00298063 0.00621542 0.00684871 0.0105348 0.0210119 0.033297 0.012928 -0.014871 0.00206854 -0.00285713 0.00472969 0.0178578 0.00857294 -0.0146494 -0.0187273 -0.00873884 -0.00413204 -0.011718 -0.0209681 -0.0121654 0.0161084 -0.00578129 0.0155471 -0.0270027 -0.00109564 0.0163727 0.0271693 0.0354372 0.00493004 0.0378445 -0.0106357 -0.0123173 -0.0121462 0.00934709 -0.031457 0.0102624 0.010856 -0.00050094 -0.00100631 -0.0271277 -0.0135485 0.0142177 0.00574197 0.0256569 -0.013492 0.00935899 -0.0163153 0.0406072 -0.00327555 0.00245778 -0.00281567 0.0257845 0.00770451 0.0251884 0.0294822 0.00276355 0.00810906 -0.0174573 -0.0338781 0.0112297 0.00311962 -0.00991078 -0.00700869 -0.0179803 0.015773 -0.00767871 -0.00370172 0.013042 0.029508 0.00521846 -0.00617608 -0.00171195 -0.00581163 0.0200856 -0.0030749 0.0152771 0.018766 0.0175574 0.00419716 -0.0259273 -0.00819757 0.00935098 0.00308121 -0.00293988 -0.0192965 -0.0102153 -0.007742 5.00092e-05 0.0290572 0.00447245 -0.010544 0.0453627 -0.00992644 0.014736 0.0238747 0.0145626 0.00352543 0.0108352 -0.0037287 0.0175906 -0.0173614 -0.0145537 -0.0181254 0.0017973 -0.0210662 -0.00144659 0.0163386 -0.0111673 0.00629401 -0.00553115 0.00725082 -0.00858071 0.0212191 0.0177298 -0.0218783 -0.0194136 0.014256 0.0190621 -0.019311 0.0101821 -0.00705696 -0.0039381 0.0404619 -0.00360258 0.0303353 -0.00616277 0.0122458 -0.0304511 0.0184304 -0.0122708 0.0157068 -0.000393143 -0.00733037 0.00960947 -0.0217826 -0.0156863 0.00209029 0.0202285 0.0197036 -0.0113502 -0.0258589 0.0306776 -0.00769199 0.000678366 -0.00625708 -0.0248818 -0.0103823 0.00201442 -0.0025065 0.0185092 0.00106013 -0.0210199 0.0231898 0.00106162 -0.00858889 -0.00171213 -0.0139197 -0.0163384 -0.0351265 0.00308369 -0.00725079 0.01571 -0.0295897 -0.00799949 -0.00511891 -0.00431994 -0.00241568 0.00137295 0.00238735 0.00508647 0.0125799 0.00916683 -0.0120234 -0.0110074 -0.0178472 0.0196929 -0.030796 -0.00181056 0.0123998 0.0168193 0.00759834 0.0155408 0.0286567 0.0114213 -0.0196802 -0.0213335 0.0054065 -0.0145758 -0.0254017 -0.0177268 0.00285283 -0.000139571 -0.00153995 0.00695871 -0.00479311 0.01217 -0.0450695 -0.0142455 -0.00768881 -0.00492453 0.0060614 -0.00147172 -0.030262 -0.014127 0.0077321 0.00247634 -0.0203835 -0.00598438 0.00528547 -0.0212584 0.0202056 -0.00974843 0.00734953 -0.00983502 -0.0117843 -0.0259547 -0.00517058 0.0273189 -0.0071676 0.0102117 -0.0173362 0.00327462 -0.00100275 0.0117493 -0.00439042 0.0126542

Then i call nlpnet-load-embedding.py
nlpnet-load-embedding.py single data_file.txt -o ./

It creates vocabulary file and types_features.npy in current directory
Then i change my data in format (first one) mentioned in the documentation. My data is in the format :

i_FW recently_RB purchased_VBD the_DT canon_NN powershot_NN g3_NN and_CC am_VBP extremely_RB satisfied_VBN with_IN the_DT purchase_NN .

I checked the nlpnet-train file. the types_table is 1 dimensional array of about 40000 float entries.
I don't understand what i did wrong that i am getting 1D array instead of 2D array

Thanks

Answer 3 · 2016-11-02T09:29:34.000Z

This is strange. As a workaround, you can load the arrays in an interactive sessio, call x.reshape((num_words, embedding_size)) and then save it. I'll check the load-embeddings script.

Answer 4 · 2016-11-04T08:45:20.000Z

hello Erick,
I tried to run the nlpnet-train.py on test data without using the parameter
load-features and it is working fine.
I am sending you the test data and trained embedding file (actually it is a
big file so i am sending you a part of it)
Please look into it and try to find the bug

Thanks

On Wed, Nov 2, 2016 at 2:59 PM, Erick Rocha Fonseca <
notifications@github.com> wrote:

This is strange. As a workaround, you can load the arrays in an
interactive sessio, call x.reshape((num_words, embedding_size)) and then
save it. I'll check the load-embeddings script.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#27 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGvSVic8L8_eW1wVc_KYLfq0IC1YnWgRks5q6FgAgaJpZM4KktF4
.

Answer 5 · 2016-11-04T14:44:41.000Z

@sharadlnx (I edited your comment because the attached embeddings were huge!)

Anyway, you should not provide the embeddings to nlpnet as text. Instead, it should be a numpy file and a vocabulary one.

If you do not use --load-embeddings, it will generate random embeddings for words in the training data. It will work, but performance will be lower.

Answer 6 · 2016-11-07T07:08:58.000Z

Hello Erick
When i call following load-embedding file
nlpnet-load-embeddings.py [FORMAT] [EMBEDDINGS_FILE] -v [VOCABULARY_FILE] -o [OUTPUT_DIRECTORY]
The parameter which i passed are
nlpnet-load-embeddings.py single test_data.txt -o ./

here test_data.txt contains embedding which is in the format:
Word Word vectors of 300 float values

if i am using SINGLE format then also do i need to pass the VOCABULARY_FILE along with the EMBEDDINGS_FILE. Also do EMBEDDINGS_FILE contains only the word vectors and not the word because when i tried to convert it in numpy array, it gives error because of the heterogeneous datatype (the first entry in each line is a string and rest 300 entries is float)

Thanks

Answer 7 · 2016-11-07T13:54:26.000Z

With this format, you don't need the -v, since the vocabulary is already inside the file. Anyway, I found a bug in the load-embeddings script which I will fix later. You could you this other script to load embeddings in your format. Just make sure to rename the generated numpy file to types-features.npy.

Answer 8 · 2016-11-08T11:47:12.000Z

It worked.
Thanks for the help