yuzhimanhua/MATCH

Other datasets?

Closed this issue · 2 comments

Thank you very much for your work and making the codebase public; it is very inspiring :)
I am planning to implement something similar and had the following questions:

  1. Is there a specific reason why MATCH was not tested on other popular hierarchical datasets like WordOfSciences, NYTimes, or RCV1-V2?
  2. Also, the readme says that experiments were done on NVIDIA GTX 1080. Can you please share the time it took for training?

Thank you very much for your interest in our work!

  1. MATCH is specifically designed for metadata-aware text classification, which means documents in the dataset should have some metadata information. As far as we know, many popular datasets such as NYTimes and RCV1/2 do not have rich metadata, so we use two academic paper datasets instead. If your dataset has metadata information other than venue/author/reference (e.g., you have a product review dataset with the reviewer/product id), please refer to https://github.com/yuzhimanhua/MATCH#running-on-new-datasets about how to use our code for your dataset. If your dataset does not have any metadata information, I would recommend some other repositories:
    https://github.com/yourh/AttentionXML
    http://manikvarma.org/downloads/XC/XMLRepository.html

  2. If you use two NVIDIA GTX 1080, the training time should be 15-20 hours.

Okay makes sense ; thank you very much!