NVIDIA-Merlin/core

[FEA] Create an easy functionality to generate dict of tensors- a standard way to move array data across frameworks

rnyak opened this issue · 1 comments

rnyak commented

When we want to trace a PyT model we do this torch.jit.trace(model, train_dict, strict=True). here train_dict is a dictionary of torch tensors. if you look at the Pyt documentation, that corresponds to example_inputs term.

currently we get the dict of tensors as follow, but I think this is not what we want users to practice:

dataset = Dataset(train_paths[0])
trainer.train_dataset_or_path = dataset
loader = trainer.get_train_dataloader()
train_dict = next(iter(loader))

Based on discussions with Karl, looks like this is related to Columns and MerlinArray. We need a standard solution for this.

We now have this via TensorTable and the related utility functions for converting back and forth between TensorTables, dataframes, and dictionaries.