NVIDIA-Merlin/NVTabular

[BUG] Categorify `start_index` not handled by Inference Op `CategorifyTransform`

oliverholworthy opened this issue · 4 comments

Describe the bug
A clear and concise description of what the bug is.

When using Categorify with a non-default start_index value, the inference version of the operator returns a different result.

Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

import numpy as np
import cudf
import nvtabular as nvt


input_tensors = {
    "a": np.array(["x", "y", "z"])
}
df = cudf.DataFrame(input_tensors)
cat_names = df.columns
cats = cat_names >> nvt.ops.Categorify(start_index=1)
workflow = nvt.Workflow(cats)
workflow.fit(nvt.Dataset(df))

feature_transformed = workflow.transform(df)["a"]
# => [2, 3, 4]

model_config = {}
inference_op = cats.op.inference_initialize(cats.input_columns, model_config)
output_tensors = inference_op.transform(cats.input_columns, input_tensors)

feature_transformed_inference = output_tensors["a"]
# => [1, 2, 3]

Expected behavior
A clear and concise description of what you expected to happen.

Expect the inference version to return the same value as the regular operator transform

Seems like something we can probably fix in the cpp version of the op for now, but in the longer term I'm wondering if we can replace the cpp versions of the ops with NumPy/CuPy versions using TensorTable. We'll soon be able to mix and match ops that process DFs vs TTs in the same graph, so I also wonder if we could just use those as the single implementation of these transforms to avoid this kind of skew/mismatch between different versions.

One thing we could do is raise an exception at export time in systems to detect if there's a Categorify op being used with non-default args. Which would be preferable to returning results that are wrong and hard to debug.

rnyak commented

@oliverholworthy we no longer have start_index. we can check how systems work with new categorify op.

Yes, we removed start_index in #1692 . So I think we can close this issue now. We also disabled the use of the 'inference' version of ops in NVIDIA-Merlin/systems#354 (at least for now), which reduces the chances of a mismatch between inference-time transform and workflow-time