[BUG] Dtype discrepancy with pandas and groupby on CPU
oliverholworthy opened this issue · 1 comments
oliverholworthy commented
Describe the bug
Steps/Code to reproduce bug
- Run notebook https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/v23.02.00/examples/getting-started-session-based/01-ETL-with-NVTabular.ipynb
- In CPU-only environmennt
TypeError: Dtype discrepancy detected for column age_days-list: operator Groupby reported dtype `DType(name='float32', element_type=<ElementType.Float: 'float'>, element_size=32, element_unit=None, signed=True, shape=Shape(dims=None))` but returned dtype `DType(name='float64', element_type=<ElementType.Float: 'float'>, element_size=64, element_unit=None, signed=True, shape=Shape(dims=None))`.
Expected behavior
No exception raised, and output matching equivalent result when running on GPU with cudf
Environment details:
- Environment location: Docker
- Method of NVTabular install: from source
Additional context
A similar issue has been reported recently #1767 . However that particular example is now working following a change in core NVIDIA-Merlin/core#226
angmc commented
@oliverholworthy I ran on the 23.04 pytorch container without GPU and it ran without error. Is this error only apparent when installing NVTabular from source? Or was it corrected with changes in core also?