pytorch/torcharrow

Supports More Operations for Recommendation Systems

Ash-Zheng opened this issue · 1 comments

Hi,

I noticed that some data preprocessing operations used in recommendation systems like bucketize, sigridHash, and firstX are implemented in: torcharrow/tree/main/csrc/velox/functions/rec

I would like to ask if other preprocessing operations for recommendation system be supported in the future?
For example, recent paper from Meta[1] mentioned 16 kinds of common preprocessing operations in the Table-11 including: bucketize, sigridHash, firstX, Cartesian, IdListTransform, BoxCox, MapId, and NGram.
Most of them are not supported now. Will these operations be supported in torcharrow in the future?

[1] Zhao, Mark, et al. "Understanding data storage and ingestion for large-scale deep recommendation model training: industrial product." Proceedings of the 49th Annual International Symposium on Computer Architecture. 2022.