HKUDS/MMSSL

Raw dataset processing details

Closed this issue · 1 comments

Can you detail on how you preprocess the raw data into V/T/A features, which stored in *npy. Only textual features is mentioned in your paper.

Details are as mentioned in the article, part of the data is from the original competition. We have separately processed two multimodal recommendation datasets, which will be released including visual posters/pictures, original textual information, and preprocessed interaction and feature data in the current pipeline format that can be directly used. The textual information in this article is inherent to the dataset itself, while the visual information is crawled from web pages. The feature data includes both regular extractors and a ChatGPT version. The new data containing the original modal information will be released in our future work. Please stay tuned for updates.