Error when using .items_pool
JackMack21 opened this issue · 12 comments
I am trying to restrict the set of items ctpfrec recommends. My items are each uniquely identified by a string e.g '48069855'.
I have tried the following yet they all result in an error being thrown:
Using either recommender.topN(user = -1, n=5, exclude_seen = True, items_pool=user_counts_test.ItemId.unique()
,
or
recommender.topN(user = -1, n=5, exclude_seen = True, items_pool=np.array(['48069855', '47994812', '47994813', '47811334', '47809545','47770950']) )
I'm presented with:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/home/research/jackmck/.local/lib/python3.7/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
57 try:
---> 58 return bound(*args, **kwds)
59 except TypeError:
TypeError: Partition index must be integer
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-94-c4d8742971d3> in <module>()
5 # new_user_count = pd.DataFrame({'UserId': -1,'ItemId': ['48028651','48065053','48057353'],'Count': [1,1,1]})
6 # recommender.add_users(new_user_count)
----> 7 recommender.topN(user = -1, n=5, exclude_seen = True, items_pool=np.array(['48069855', '47994812', '47994813', '47811334', '47809545','47770950']) ) # think about excluding seen
/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/__init__.py in topN(self, user, n, exclude_seen, items_pool)
1300 raise Exception("Can only exclude seen items when passing 'keep_data=True' to .fit")
1301
-> 1302 return self._topN(self._M1[user], n, exclude_seen, items_pool, user)
1303
1304 def topN_cold(self, user_df, n=10, items_pool=None, maxiter=10, ncores=1, random_seed=1, stop_thr=1e-3):
/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/__init__.py in _topN(self, user_vec, n, exclude_seen, items_pool, user)
1245 if exclude_seen:
1246 n_ext = np.min([n + self._n_seen_by_user[user], items_pool.shape[0]])
-> 1247 rec = np.argpartition(allpreds, n_ext-1)[:n_ext]
1248 seen = self.seen[self._st_ix_user[user] : self._st_ix_user[user] + self._n_seen_by_user[user]]
1249 if self.reindex:
<__array_function__ internals> in argpartition(*args, **kwargs)
/home/research/jackmck/.local/lib/python3.7/site-packages/numpy/core/fromnumeric.py in argpartition(a, kth, axis, kind, order)
830
831 """
--> 832 return _wrapfunc(a, 'argpartition', kth, axis=axis, kind=kind, order=order)
833
834
/home/research/jackmck/.local/lib/python3.7/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
65 # Call _wrapit from within the except clause to ensure a potential
66 # exception has a traceback chain.
---> 67 return _wrapit(obj, method, *args, **kwds)
68
69
/home/research/jackmck/.local/lib/python3.7/site-packages/numpy/core/fromnumeric.py in _wrapit(obj, method, *args, **kwds)
42 except AttributeError:
43 wrap = None
---> 44 result = getattr(asarray(obj), method)(*args, **kwds)
45 if wrap:
46 if not isinstance(result, mu.ndarray):
TypeError: Partition index must be integer
If I call the ItemIds as integers - rather than their original string format:
recommender.topN(user = -1, n=5, exclude_seen = True, items_pool=np.array([48069855, 47994812, 47994813, 4781133, 47809545, 47770950]) )
I am presented with the error:
ValueError Traceback (most recent call last)
<ipython-input-97-6594e242d050> in <module>()
5 # new_user_count = pd.DataFrame({'UserId': -1,'ItemId': ['48028651','48065053','48057353'],'Count': [1,1,1]})
6 # recommender.add_users(new_user_count)
----> 7 recommender.topN(user = -1, n=5, exclude_seen = True, items_pool=np.array([48069855, 47994812, 47994813, 4781133, 47809545, 47770950]) ) # think about excluding seen
/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/__init__.py in topN(self, user, n, exclude_seen, items_pool)
1300 raise Exception("Can only exclude seen items when passing 'keep_data=True' to .fit")
1301
-> 1302 return self._topN(self._M1[user], n, exclude_seen, items_pool, user)
1303
1304 def topN_cold(self, user_df, n=10, items_pool=None, maxiter=10, ncores=1, random_seed=1, stop_thr=1e-3):
/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/__init__.py in _topN(self, user_vec, n, exclude_seen, items_pool, user)
1230 del nan_ix
1231 if items_pool_reind.shape[0] == 0:
-> 1232 raise ValueError("No items to recommend.")
1233 elif items_pool_reind.shape[0] == 1:
1234 raise ValueError("Only 1 item to recommend.")
ValueError: No items to recommend.
I've pushed a small update. Please try again from the master branch.
I have reinstalled ctpfrec
through pip yet I am presented with the same issue; is this what you meant by 'try again from the master branch'?
I have reinstalled
ctpfrec
through pip yet I am presented with the same issue; is this what you meant by 'try again from the master branch'?
I mean like this: pip install git+https://www.github.com:david-cortes/ctpfrec.git
Or by downloading the repository and installing python setup.py install
or with pip but from the downloaded repository.
I am presented with an error when attempting this:
Defaulting to user installation because normal site-packages is not writeable
Collecting git+https://www.github.com:david-cortes/ctpfrec.git
Cloning https://www.github.com:david-cortes/ctpfrec.git to /home/tmp/pip-req-build-oti082g1
Running command git clone -q https://www.github.com:david-cortes/ctpfrec.git /home/tmp/pip-req-build-oti082g1
fatal: unable to access 'https://www.github.com:david-cortes/ctpfrec.git/': URL using bad/illegal format or missing URL
WARNING: Discarding git+https://www.github.com:david-cortes/ctpfrec.git. Command errored out with exit status 128: git clone -q https://www.github.com:david-cortes/ctpfrec.git /home/tmp/pip-req-build-oti082g1 Check the logs for full command output.
ERROR: Command errored out with exit status 128: git clone -q https://www.github.com:david-cortes/ctpfrec.git /home/tmp/pip-req-build-oti082g1 Check the logs for full command output.
Am I right in thinking this is an error from your end?
Yes sorry, the command should be:
pip install git+https://www.github.com/david-cortes/ctpfrec.git
Ah yes, sorry I probably should have spotted that myself!
I am now presented with another error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-15-00f50f2b635d> in <module>()
5 # new_user_count = pd.DataFrame({'UserId': -1,'ItemId': ['48028651','48065053','48057353'],'Count': [1,1,1]})
6 # recommender.add_users(new_user_count)
----> 7 recommender.topN(user = -1, n=5, exclude_seen = True, items_pool=np.array(['48069855','47994812','47994813','47809545'])) # think about excluding seen
/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/__init__.py in topN(self, user, n, exclude_seen, items_pool)
1300 raise Exception("Can only exclude seen items when passing 'keep_data=True' to .fit")
1301
-> 1302 return self._topN(self._M1[user], n, exclude_seen, items_pool, user)
1303
1304 def topN_cold(self, user_df, n=10, items_pool=None, maxiter=10, ncores=1, random_seed=1, stop_thr=1e-3):
/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/__init__.py in _topN(self, user_vec, n, exclude_seen, items_pool, user)
1246 n_ext = int(np.min([n + self._n_seen_by_user[user], items_pool.shape[0]]))
1247 rec = np.argpartition(allpreds, n_ext-1)[:n_ext]
-> 1248 seen = self.seen[self._st_ix_user[user] : self._st_ix_user[user] + self._n_seen_by_user[user]]
1249 if self.reindex:
1250 rec = np.setdiff1d(items_pool_reind[rec], seen)
TypeError: slice indices must be integers or None or have an __index__ method
I've pushed another update which should fix this new issue, please try again. If you encounter some other issue, a runnable small example with randomly generated data would be helpful.
I am still encountering issues. I have generated a small example to show you the problem:
import numpy as np, pandas as pd
from ctpfrec import CTPF
counts_df = pd.DataFrame({
'UserId' : [0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3],
'ItemId' : [0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7],
'Count' : [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]})
counts_df = counts_df.loc[~counts_df[['UserId', 'ItemId']].duplicated()].reset_index(drop=True)
words_df = pd.DataFrame({
'ItemId' : [0,0,0,0,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7],
'WordId' : np.random.randint(8, size=50),
'Count' : [1]*50})
words_df = words_df.loc[~words_df[['ItemId', 'WordId']].duplicated()].reset_index(drop=True)
recommender = CTPF(k = 15, reindex=True)
recommender.fit(counts_df=counts_df, words_df=words_df)
new_user_count = pd.DataFrame({'UserId': -1,'ItemId': [1,2,3],'Count': [1,1,1]})
recommender.add_users(new_user_count)
recommender.topN(user = -1, n=5, exclude_seen = True, items_pool=np.array([5,6,7]))
Upon running the code above I am presented with the error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-25-b96e663176e9> in <module>()
1 new_user_count = pd.DataFrame({'UserId': -1,'ItemId': [1,2,3],'Count': [1,1,1]})
2 recommender.add_users(new_user_count)
----> 3 recommender.topN(user = -1, n=5, exclude_seen = True, items_pool=np.array([5,6,7])) # think about excluding seen
/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/__init__.py in topN(self, user, n, exclude_seen, items_pool)
1300 raise Exception("Can only exclude seen items when passing 'keep_data=True' to .fit")
1301
-> 1302 return self._topN(self._M1[user], n, exclude_seen, items_pool, user)
1303
1304 def topN_cold(self, user_df, n=10, items_pool=None, maxiter=10, ncores=1, random_seed=1, stop_thr=1e-3):
/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/__init__.py in _topN(self, user_vec, n, exclude_seen, items_pool, user)
1246 n_ext = int(np.min([n + self._n_seen_by_user[user], items_pool.shape[0]]))
1247 rec = np.argpartition(allpreds, n_ext-1)[:n_ext]
-> 1248 seen = self.seen[self._st_ix_user[user] : self._st_ix_user[user] + self._n_seen_by_user[user]]
1249 if self.reindex:
1250 rec = np.setdiff1d(items_pool_reind[rec], seen)
TypeError: slice indices must be integers or None or have an __index__ method
I am unable to reproduce it. That code snippet executes without errors on my setup.
Which versions of the following software are you using?
- Python
- NumPy
- Pandas
Do you still experience the issue if you update to NumPy>=1.20.1 and Pandas 1.2.3?
Also, from the error message, the version of ctpfrec
that you are using doesn't have the latest modifications that are in the github version.
Nevermind, I've nailed down the issue, this should be fixed now along with the other bug about IDs. Please try updating to version 0.1.12.