NVIDIA-Merlin/NVTabular

[QST] How to construct schema based on a single "items" list feature?

ardulat opened this issue · 3 comments

What is your question?

Hello! First, a bit of context: I am using NVTabular for further usage in Transformers4Rec. Hence, I am working on session-based recommendations. Currently, I only have one feature, an "items" list of product IDs (string). So, how do I construct a Schema necessary for transformers4rec.torch.TabularSequenceFeatures?

More context: I went through some examples of notebooks in Transformers4Rec documentation, but the main issue is related to NVTabular preprocessing. I have tried using nvt.Workflow to create a schema from a pandas data frame with an "items" list feature (as in the example), but I get the following:

Screenshot 2023-05-16 at 10 00 14 PM

In contrast, I am trying to get something like this:

Screenshot 2023-05-16 at 10 00 00 PM

The item_id-list have tags saying these are categorical features (further necessary for TabularSequenceFeatures). How do I get the representation of the same tags if I already have an "items" list in my data frame?

When NVT infers a schema, it can figure out the dtypes and so forth, but can't tell what the semantics of the fields are, so leaves the tags blank. You can add any additional tags with the AddTags operator. It accepts a list of plain strings and will auto-convert them to Tags if needed.

["item_id-list"] >> AddTags(tags=["ITEM","LIST","ITEM_ID","ID"])
rnyak commented

@ardulat I think this issue was resolved already?

@rnyak, yes, the issue was solved at NVIDIA-Merlin/Transformers4Rec#703.