[QST] Is it possible to extract indices and continuous features rules from NVTabular workflow?
Nepherhotep opened this issue · 2 comments
Nepherhotep commented
We are trying to optimize features preprocessing step for the real-time inference, where latency is critical. We can cache some intermediate data for building tensor more efficiently, but for that purposes we need a way to extract categorical features mapping, as well as continuous feature conversion rules from the trained NVTabular workflow. Is there a way doing it?
Thanks!
shoyasaxa commented
Hello team - just to add more information,
- We are setting up online inference where features need to be preprocessed in real-time. We just need to preprocess one to few rows of data, and passing it through NVT
transform()
function takes too long. - We are looking to instead extract the categorical features mapping that NVT workflow has fitted to as well as the statistics that NVT collected in for the
Normalize
operator for each of the continuous variables (please assume all the continuous variables are simply passed throughNormalize
operator). - We are aware that the index mapping for categorical features can be retrieved by looking at the parquet files in the
categories/
folder of the saved workflow. However, the difficulty comes with extracting the statistics learned for the continuous variables. From a quick glance around, it doesn't seem like these statistics are saved in a separate file, and I'm guessing they are pickled together in the workflow. We are looking to be able to do something similar to the following with an already fitted workflow:
print(workflow.learned_statistics["my_continuous_variable1"])
>> {"mean": 0.85, "std": 1.2 }
Is something like this possible? Please let us know!
We are in NVT version 23.04.00
using merlin-pytorch:23.04
from here.
Thank you for your help!
sibadakesi commented
We encountered the same problem