Other datasets?

Question

Other datasets?

Closed this issue 2 years ago · 2 comments

Thank you very much for your work and making the codebase public; it is very inspiring :)
I am planning to implement something similar and had the following questions:

Is there a specific reason why MATCH was not tested on other popular hierarchical datasets like WordOfSciences, NYTimes, or RCV1-V2?
Also, the readme says that experiments were done on NVIDIA GTX 1080. Can you please share the time it took for training?

Answer 1 · 2022-07-18T23:44:29.000Z

Thank you very much for your interest in our work!

MATCH is specifically designed for metadata-aware text classification, which means documents in the dataset should have some metadata information. As far as we know, many popular datasets such as NYTimes and RCV1/2 do not have rich metadata, so we use two academic paper datasets instead. If your dataset has metadata information other than venue/author/reference (e.g., you have a product review dataset with the reviewer/product id), please refer to https://github.com/yuzhimanhua/MATCH#running-on-new-datasets about how to use our code for your dataset. If your dataset does not have any metadata information, I would recommend some other repositories:
https://github.com/yourh/AttentionXML
http://manikvarma.org/downloads/XC/XMLRepository.html
If you use two NVIDIA GTX 1080, the training time should be 15-20 hours.

Answer 2 · 2022-07-19T22:18:13.000Z

Okay makes sense ; thank you very much!