I can't use the method fit_transform
Closed this issue · 6 comments
I tried using the fit_transform
method but it gives me an error.
First I use the TF-IDF method to transform the text I am working with. This gives me a result of an array of '(136, 15063)'. Then I use the fit_transform
method that the library includes but it gives me the following error:
File "C:\Users\luismiguel\Documents\Papers\2020_paper\08_ANMM_training.py", line 161, in <module>
train_anmm = ANMM_c.fit_transform(train_tfidf, labels_train)
File "C:\Users\luismiguel\Anaconda3\lib\site-packages\sklearn\base.py", line 574, in fit_transform
return self.fit(X, y, **fit_params).transform(X)
File "dml\anmm.pyx", line 107, in dml.anmm.ANMM.fit
File "dml\anmm.pyx", line 175, in dml.anmm.ANMM._compute_matrices
IndexError: index 1847620713 is out of bounds for axis 0 with size 136
Hi, can you provide a minimal working example to reproduce this issue?
I have a file lines with text. I use fit_transform
that is inTfidfVectorizer
from Sklearn
, the result of this operation gives me a sparse matrix. I converted this matrix to a numpy array then I pass this array to fit_transform
of AMNN and then gives me the error. In the image data_train
is a file with text.
I have not been able to get the error with the code you provided. Can you link me to a file with the output of train_tfidf.toarray()
before applying ANMM? And if you could add the output of labels_train
it would be good too.
I have not been able to get the error with the code you provided. Can you link me to a file with the output of
train_tfidf.toarray()
before applying ANMM? And if you could add the output oflabels_train
it would be good too.
labels_train
is an array of one dimension that contains [0 , 0, 0, 0, 1, 1, 1 ...]
I attached both files.
Thanks for pointing this issue. There is indeed a bug in ANMM. The dataset provided only has one sample with the class 3, and when searching for same-class neighbors it gets wrong values. I will have this error fixed soon.
I think it is already fixed. It will be available in the next release on PyPI. Meanwhile you can use the unreleased version by cloning the github repository and installing it via python setup.py install
. If there is any problem let me know.