khuangaf/PyTorch-Geometric-YooChoose

Valid_Session

ashwinvasudevan opened this issue · 1 comments

What does this particular line do in the context of the dataset?

clicks_df['valid_session'] = clicks_df.session_id.map(clicks_df.groupby('session_id')['item_id'].size() > 2)

As far as I am able to understand, it groups by 'session_id', returns True if the numbers of items in the particular session > 2, else returns false. How is this binary pd.Series mapped to df_session_id? As far as I understand from pandas documentation, map is used to replace values.

Solved.
Map function allows to apply True/False to every group.
In the next line, all falses are dropped.

In the context of the dataset, this line and the next line drop the sessions where number of items < 3