NVIDIA-Merlin/NVTabular

[BUG] Throw warning if reserved column is used

bschifferer opened this issue · 0 comments

Describe the bug
It seems, we are not allowed to call a column labels for the categorify op.

Steps/Code to reproduce bug

import cudf
import nvtabular as nvt
from nvtabular.ops import *


df = cudf.DataFrame({'labels': [10,11,12]})

feat = ['labels'] >> nvt.ops.Categorify()

workflow = nvt.Workflow(feat)

dataset = nvt.Dataset(df, cpu=False)
workflow.fit(dataset)
workflow.transform(dataset).compute()

The output is the original input [10,11,12]

Expected behavior
The output should be the categorified column

We should throw at least a warning (or even an error), that we cannot use labels as a column name in categorify

Environment details (please complete the following information):
I tested it in pytorch:22.12 container. Reading the NVT code, it seems that labels is a special column name

https://github.com/NVIDIA-Merlin/NVTabular/blob/main/nvtabular/ops/categorify.py#L1645

Additional context