david-cortes/ctpfrec

Error when using .items_pool

JackMack21 opened this issue · 12 comments

I am trying to restrict the set of items ctpfrec recommends. My items are each uniquely identified by a string e.g '48069855'.

I have tried the following yet they all result in an error being thrown:

Using either recommender.topN(user = -1, n=5, exclude_seen = True, items_pool=user_counts_test.ItemId.unique(),
or
recommender.topN(user = -1, n=5, exclude_seen = True, items_pool=np.array(['48069855', '47994812', '47994813', '47811334', '47809545','47770950']) )

I'm presented with:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/research/jackmck/.local/lib/python3.7/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
     57     try:
---> 58         return bound(*args, **kwds)
     59     except TypeError:

TypeError: Partition index must be integer

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-94-c4d8742971d3> in <module>()
      5 # new_user_count = pd.DataFrame({'UserId': -1,'ItemId': ['48028651','48065053','48057353'],'Count': [1,1,1]})
      6 # recommender.add_users(new_user_count)
----> 7 recommender.topN(user = -1, n=5, exclude_seen = True, items_pool=np.array(['48069855', '47994812', '47994813', '47811334', '47809545','47770950']) ) # think about excluding seen

/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/__init__.py in topN(self, user, n, exclude_seen, items_pool)
   1300                         raise Exception("Can only exclude seen items when passing 'keep_data=True' to .fit")
   1301 
-> 1302                 return self._topN(self._M1[user], n, exclude_seen, items_pool, user)
   1303 
   1304         def topN_cold(self, user_df, n=10, items_pool=None, maxiter=10, ncores=1, random_seed=1, stop_thr=1e-3):

/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/__init__.py in _topN(self, user_vec, n, exclude_seen, items_pool, user)
   1245                         if exclude_seen:
   1246                                 n_ext = np.min([n + self._n_seen_by_user[user], items_pool.shape[0]])
-> 1247                                 rec = np.argpartition(allpreds, n_ext-1)[:n_ext]
   1248                                 seen = self.seen[self._st_ix_user[user] : self._st_ix_user[user] + self._n_seen_by_user[user]]
   1249                                 if self.reindex:

<__array_function__ internals> in argpartition(*args, **kwargs)

/home/research/jackmck/.local/lib/python3.7/site-packages/numpy/core/fromnumeric.py in argpartition(a, kth, axis, kind, order)
    830 
    831     """
--> 832     return _wrapfunc(a, 'argpartition', kth, axis=axis, kind=kind, order=order)
    833 
    834 

/home/research/jackmck/.local/lib/python3.7/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
     65         # Call _wrapit from within the except clause to ensure a potential
     66         # exception has a traceback chain.
---> 67         return _wrapit(obj, method, *args, **kwds)
     68 
     69 

/home/research/jackmck/.local/lib/python3.7/site-packages/numpy/core/fromnumeric.py in _wrapit(obj, method, *args, **kwds)
     42     except AttributeError:
     43         wrap = None
---> 44     result = getattr(asarray(obj), method)(*args, **kwds)
     45     if wrap:
     46         if not isinstance(result, mu.ndarray):

TypeError: Partition index must be integer

If I call the ItemIds as integers - rather than their original string format:
recommender.topN(user = -1, n=5, exclude_seen = True, items_pool=np.array([48069855, 47994812, 47994813, 4781133, 47809545, 47770950]) )
I am presented with the error:

ValueError                                Traceback (most recent call last)
<ipython-input-97-6594e242d050> in <module>()
      5 # new_user_count = pd.DataFrame({'UserId': -1,'ItemId': ['48028651','48065053','48057353'],'Count': [1,1,1]})
      6 # recommender.add_users(new_user_count)
----> 7 recommender.topN(user = -1, n=5, exclude_seen = True, items_pool=np.array([48069855, 47994812, 47994813, 4781133, 47809545, 47770950]) ) # think about excluding seen

/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/__init__.py in topN(self, user, n, exclude_seen, items_pool)
   1300                         raise Exception("Can only exclude seen items when passing 'keep_data=True' to .fit")
   1301 
-> 1302                 return self._topN(self._M1[user], n, exclude_seen, items_pool, user)
   1303 
   1304         def topN_cold(self, user_df, n=10, items_pool=None, maxiter=10, ncores=1, random_seed=1, stop_thr=1e-3):

/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/__init__.py in _topN(self, user_vec, n, exclude_seen, items_pool, user)
   1230                                         del nan_ix
   1231                                         if items_pool_reind.shape[0] == 0:
-> 1232                                                 raise ValueError("No items to recommend.")
   1233                                         elif items_pool_reind.shape[0] == 1:
   1234                                                 raise ValueError("Only 1 item to recommend.")

ValueError: No items to recommend.

I've pushed a small update. Please try again from the master branch.

I have reinstalled ctpfrec through pip yet I am presented with the same issue; is this what you meant by 'try again from the master branch'?

I have reinstalled ctpfrec through pip yet I am presented with the same issue; is this what you meant by 'try again from the master branch'?

I mean like this: pip install git+https://www.github.com:david-cortes/ctpfrec.git

Or by downloading the repository and installing python setup.py install or with pip but from the downloaded repository.

I am presented with an error when attempting this:

Defaulting to user installation because normal site-packages is not writeable
Collecting git+https://www.github.com:david-cortes/ctpfrec.git
  Cloning https://www.github.com:david-cortes/ctpfrec.git to /home/tmp/pip-req-build-oti082g1
  Running command git clone -q https://www.github.com:david-cortes/ctpfrec.git /home/tmp/pip-req-build-oti082g1
  fatal: unable to access 'https://www.github.com:david-cortes/ctpfrec.git/': URL using bad/illegal format or missing URL
WARNING: Discarding git+https://www.github.com:david-cortes/ctpfrec.git. Command errored out with exit status 128: git clone -q https://www.github.com:david-cortes/ctpfrec.git /home/tmp/pip-req-build-oti082g1 Check the logs for full command output.
ERROR: Command errored out with exit status 128: git clone -q https://www.github.com:david-cortes/ctpfrec.git /home/tmp/pip-req-build-oti082g1 Check the logs for full command output.

Am I right in thinking this is an error from your end?

Yes sorry, the command should be:

pip install git+https://www.github.com/david-cortes/ctpfrec.git

Ah yes, sorry I probably should have spotted that myself!

I am now presented with another error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-00f50f2b635d> in <module>()
      5 # new_user_count = pd.DataFrame({'UserId': -1,'ItemId': ['48028651','48065053','48057353'],'Count': [1,1,1]})
      6 # recommender.add_users(new_user_count)
----> 7 recommender.topN(user = -1, n=5, exclude_seen = True, items_pool=np.array(['48069855','47994812','47994813','47809545'])) # think about excluding seen

/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/__init__.py in topN(self, user, n, exclude_seen, items_pool)
   1300                         raise Exception("Can only exclude seen items when passing 'keep_data=True' to .fit")
   1301 
-> 1302                 return self._topN(self._M1[user], n, exclude_seen, items_pool, user)
   1303 
   1304         def topN_cold(self, user_df, n=10, items_pool=None, maxiter=10, ncores=1, random_seed=1, stop_thr=1e-3):

/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/__init__.py in _topN(self, user_vec, n, exclude_seen, items_pool, user)
   1246                                 n_ext = int(np.min([n + self._n_seen_by_user[user], items_pool.shape[0]]))
   1247                                 rec = np.argpartition(allpreds, n_ext-1)[:n_ext]
-> 1248                                 seen = self.seen[self._st_ix_user[user] : self._st_ix_user[user] + self._n_seen_by_user[user]]
   1249                                 if self.reindex:
   1250                                         rec = np.setdiff1d(items_pool_reind[rec], seen)

TypeError: slice indices must be integers or None or have an __index__ method

I've pushed another update which should fix this new issue, please try again. If you encounter some other issue, a runnable small example with randomly generated data would be helpful.

I am still encountering issues. I have generated a small example to show you the problem:

import numpy as np, pandas as pd
from ctpfrec import CTPF

counts_df = pd.DataFrame({
'UserId' : [0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3],
'ItemId' : [0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7],
'Count'  : [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]})
counts_df = counts_df.loc[~counts_df[['UserId', 'ItemId']].duplicated()].reset_index(drop=True)

words_df = pd.DataFrame({
'ItemId' : [0,0,0,0,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7],
'WordId' : np.random.randint(8, size=50),
'Count'  : [1]*50})
words_df = words_df.loc[~words_df[['ItemId', 'WordId']].duplicated()].reset_index(drop=True)

recommender = CTPF(k = 15, reindex=True)
recommender.fit(counts_df=counts_df, words_df=words_df)

new_user_count = pd.DataFrame({'UserId': -1,'ItemId': [1,2,3],'Count': [1,1,1]})
recommender.add_users(new_user_count)
recommender.topN(user = -1, n=5, exclude_seen = True, items_pool=np.array([5,6,7]))

Upon running the code above I am presented with the error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-25-b96e663176e9> in <module>()
      1 new_user_count = pd.DataFrame({'UserId': -1,'ItemId': [1,2,3],'Count': [1,1,1]})
      2 recommender.add_users(new_user_count)
----> 3 recommender.topN(user = -1, n=5, exclude_seen = True, items_pool=np.array([5,6,7])) # think about excluding seen

/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/__init__.py in topN(self, user, n, exclude_seen, items_pool)
   1300                         raise Exception("Can only exclude seen items when passing 'keep_data=True' to .fit")
   1301 
-> 1302                 return self._topN(self._M1[user], n, exclude_seen, items_pool, user)
   1303 
   1304         def topN_cold(self, user_df, n=10, items_pool=None, maxiter=10, ncores=1, random_seed=1, stop_thr=1e-3):

/home/research/jackmck/.local/lib/python3.7/site-packages/ctpfrec/__init__.py in _topN(self, user_vec, n, exclude_seen, items_pool, user)
   1246                                 n_ext = int(np.min([n + self._n_seen_by_user[user], items_pool.shape[0]]))
   1247                                 rec = np.argpartition(allpreds, n_ext-1)[:n_ext]
-> 1248                                 seen = self.seen[self._st_ix_user[user] : self._st_ix_user[user] + self._n_seen_by_user[user]]
   1249                                 if self.reindex:
   1250                                         rec = np.setdiff1d(items_pool_reind[rec], seen)

TypeError: slice indices must be integers or None or have an __index__ method

I am unable to reproduce it. That code snippet executes without errors on my setup.

Which versions of the following software are you using?

  • Python
  • NumPy
  • Pandas

Do you still experience the issue if you update to NumPy>=1.20.1 and Pandas 1.2.3?

Also, from the error message, the version of ctpfrec that you are using doesn't have the latest modifications that are in the github version.

Nevermind, I've nailed down the issue, this should be fixed now along with the other bug about IDs. Please try updating to version 0.1.12.