jlsuarezdiaz/pyDML

I can't use the method fit_transform

Closed this issue · 6 comments

I tried using the fit_transform method but it gives me an error.

First I use the TF-IDF method to transform the text I am working with. This gives me a result of an array of '(136, 15063)'. Then I use the fit_transform method that the library includes but it gives me the following error:

  File "C:\Users\luismiguel\Documents\Papers\2020_paper\08_ANMM_training.py", line 161, in <module>
    train_anmm = ANMM_c.fit_transform(train_tfidf, labels_train)

  File "C:\Users\luismiguel\Anaconda3\lib\site-packages\sklearn\base.py", line 574, in fit_transform
    return self.fit(X, y, **fit_params).transform(X)

  File "dml\anmm.pyx", line 107, in dml.anmm.ANMM.fit

  File "dml\anmm.pyx", line 175, in dml.anmm.ANMM._compute_matrices

IndexError: index 1847620713 is out of bounds for axis 0 with size 136

Hi, can you provide a minimal working example to reproduce this issue?

I have a file lines with text. I use fit_transform that is inTfidfVectorizer from Sklearn, the result of this operation gives me a sparse matrix. I converted this matrix to a numpy array then I pass this array to fit_transform of AMNN and then gives me the error. In the image data_train is a file with text.

1

I have not been able to get the error with the code you provided. Can you link me to a file with the output of train_tfidf.toarray() before applying ANMM? And if you could add the output of labels_train it would be good too.

I have not been able to get the error with the code you provided. Can you link me to a file with the output of train_tfidf.toarray() before applying ANMM? And if you could add the output of labels_train it would be good too.

labels_train is an array of one dimension that contains [0 , 0, 0, 0, 1, 1, 1 ...] I attached both files.

Samples.zip

Thanks for pointing this issue. There is indeed a bug in ANMM. The dataset provided only has one sample with the class 3, and when searching for same-class neighbors it gets wrong values. I will have this error fixed soon.

I think it is already fixed. It will be available in the next release on PyPI. Meanwhile you can use the unreleased version by cloning the github repository and installing it via python setup.py install. If there is any problem let me know.