NVIDIA-Merlin/NVTabular

[REA] How to remove tags?

AresDan opened this issue · 4 comments

Hello,

I would like to ask how can I remove tags from schema while building a preprocessing using NVTabular? I want to extract the last element from the list that was sliced, however, tag LIST is dragged along and I couldn't find any function which would remove it.

Thank you in advance.

@jperez999 @radekosmulski @rnyak Since we have operators that add tags, operators that remove them also seem like something we should have. This may also represent a case where one of the operators (ListSlice?) should remove the list tag when the result is a scalar column. Would one (or more) of you up for tackling this issue?

rnyak commented

@karlhigley agreed this can be a useful feature, let's sync.

rnyak commented

@AresDan may be as a workaround you can do something like that:

  • if you are using groupby op to generate item-id-list col, then you can add last like ("item_id": ["list", "count", "last"],) so that it will automatically create a col of last item-id and it wont tag item_id-last as LIST.
  • then remove the last item from the item-id-list column.

@AresDan may be as a workaround you can do something like that:

  • if you are using groupby op to generate item-id-list col, then you can add last like ("item_id": ["list", "count", "last"],) so that it will automatically create a col of last item-id and it wont tag item_id-last as LIST.
  • then remove the last item from the item-id-list column.

This is what I tried to do as well and it worked. However, when I want to filter items in item-id-list to take only elements of length 2 or more, I somehow need to filter item-id-last as well, and I can't do that, as the length of this feature is 1. And if I don't do any filtering on item-id-last, then when I join item-id-list and item-id-last they end up having different shape and Nan values appear in the final table