Valid_Session
ashwinvasudevan opened this issue · 1 comments
ashwinvasudevan commented
What does this particular line do in the context of the dataset?
clicks_df['valid_session'] = clicks_df.session_id.map(clicks_df.groupby('session_id')['item_id'].size() > 2)
As far as I am able to understand, it groups by 'session_id', returns True if the numbers of items in the particular session > 2, else returns false. How is this binary pd.Series mapped to df_session_id? As far as I understand from pandas documentation, map is used to replace values.
ashwinvasudevan commented
Solved.
Map function allows to apply True/False to every group.
In the next line, all falses are dropped.
In the context of the dataset, this line and the next line drop the sessions where number of items < 3