NVIDIA-Merlin/NVTabular
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
PythonApache-2.0
Issues
- 1
[BUG] ops.GroupBy after ops.Filter fails to group correctly, and produces unexpected NaNs
#1886 opened by matib99 - 0
[BUG] Multi-GPU training failing during data loading: tabulate: failed to synchronize: cudaErrorIllegalAddress
#1879 opened by srastatter - 0
Slow performance of Categorify operation on Triton Inference Server
#1885 opened by rahuljantwal-8451 - 9
- 1
[BUG] NVtabular.dataset.to_parquet(...) Improperly matched output dtypes detected in time, object and datetime64[ns]
#1883 opened by Zachacy - 27
[QST] How can I fit a Workflow to a large dataset ?
#1761 opened by Azilyss - 4
How I Can Apply Inverse Transform
#1858 opened by mustfkeskin - 1
[BUG] Distributed Training With (NVTabular + Pytorch DDP), I got this error: `RuntimeError: parallel_for: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered`
#1876 opened by SunnyGhj - 0
[BUG] Add "upper pinning" for dependencies
#1875 opened by rjzamora - 2
[BUG] Reading parquet dataset on GPU throws "cudf engine doesn't support the following keyword arguments: ['strings_to_categorical']" error
#1873 opened by orlev2 - 3
[BUG] IndexError: list index out of range
#1872 opened by Oussamakhammassi - 2
[QST] Is it possible to extract indices and continuous features rules from NVTabular workflow?
#1866 opened by Nepherhotep - 4
[QST] TypeError: unhashable type: 'numpy.ndarray'
#1871 opened by dking21st - 7
[QST]how can i change int64 to float64
#1768 opened by gukejun1 - 9
[QST] How to omit the `Dataset.shuffle_by_keys` step when exporting data from BigQuery to parquet
#1862 opened by piojanu - 3
[QST] Additional GPU mem reservation when creating a `Dataset` causes OOM when allocating all GPU mem to the LocalCUDACluster
#1863 opened by piojanu - 3
[BUG] Exception: "IndexError('list index out of range')"
#1860 opened by vs385 - 0
- 0
[BUG] ops.Categorify frequency hashing raises RuntimeError when the dataset is shuffled by keys
#1864 opened by piojanu - 1
how to Categorify or other feature Operation to same content of different columns examples of list and simple columns[BUG]
#1828 opened by ChasingStar95 - 1
- 2
[QST] Why does nvt.ops.Categorify in 23.06 add 3 to the cardinality of a dataset's column?
#1856 opened by bogdan-radu-nechita - 4
[REA] How to remove tags?
#1855 opened by AresDan - 11
[BUG] workflow_fit_transform func not saving workflow output under the specified output_path
#1814 opened by Tselmeg-C - 6
[BUG] `Groupby` ags output columns (even if they are counts) as `Tags.CATEGORICAL`
#1841 opened by radekosmulski - 0
[BUG] Throw warning if reserved column is used
#1845 opened by bschifferer - 1
[BUG] TargetEncoding requires the target columns to exist in a dataset in transform()
#1840 opened by gabrielspmoreira - 2
[BUG] Saving workflows with Categorify or TargetEncoding fails to write stats files
#1837 opened by nv-alaiacano - 4
[BUG] Categorify `start_index` not handled by Inference Op `CategorifyTransform`
#1800 opened by oliverholworthy - 0
[BUG] TargetEncoding with multiple target columns makes targets to be switches
#1839 opened by gabrielspmoreira - 1
[DOC]examples/01-Getting-started.ipynb
#1806 opened by Pelps12 - 3
- 4
- 5
[BUG] NVTabular Dataset constructor cannot process cudf.StructType values.
#1808 opened by drobison00 - 1
- 0
[BUG] NVTabular ListSlice Op fails on CPU
#1816 opened by bschifferer - 3
[QST] pls clarify 'Categorify' behavior
#1809 opened by Tselmeg-C - 10
[QST] BigQuery data types
#1773 opened by ldane - 8
[BUG] Unable to access cuDF due to RuntimeError: cuDF failure : Unsupported type_id conversion to cudf
#1803 opened by mtnt-2022 - 0
NVTabular: CUDA 12 Conda Packages
#1772 opened by jakirkham - 10
- 0
[FEA] Provide a way to supply the Categorify encoding manually without fitting
#1797 opened by karlhigley - 1
[Task] Delete unncessary unittests
#1781 opened by bschifferer - 3
[QST]how can i solve 'Unmanaged memory use is high' problem for large criteo dataset
#1769 opened by gukejun1 - 38
[QST] Incompatible CUDA Version
#1764 opened by ldane - 19
[QST]How Do I Solve the Problem that Missing Values Cannot Be Converted to Int Values?
#1770 opened by gukejun1 - 0
- 0
[BUG] GroupBy schema shapes are incorrect when using `list` agg on list column
#1763 opened by karlhigley - 0
- 4
[BUG] fitting XGBoost killed by OOM error
#1750 opened by bilzard