Alibaba-NLP/HiAGM

rcv1 sample replaced with the original data but not getting result.

Opened this issue · 6 comments

Hi. I got the original RCV1-V2 dataset used your preprocess file (clean_str and clean_stopwors) and converted it to the required format {'token': List[str], 'label': List[str]}. I made the train, val, and test dataset according to the benchmark data split.

Then ran the helper.hierarchical_tree_statistic.py to get the rcv1-prob.json file.

Then used your taxonomy file and the "gcn-rcv1-v2.json" config file (which I only replaced "hierarchy": "sample_rcv1.taxonomy" with "hierarchy": "rcv1.taxonomy") but I am not getting a result. The precision, recall, Micro-f1 and Macro-f1 are all zero and loss is nan.

Do I need to perform any additional steps when running your code on other data (not the sample rcv1 dataset)?

Hi, Did you try to run this model directly using their RCV1-V2 data after preprocessing, which is stored in the HiAGM/data? I ran it and got a very bad result, which is far away from the result in this paper.

Hi, Did you try to run this model directly using their RCV1-V2 data after preprocessing, which is stored in the HiAGM/data? I ran it and got a very bad result, which is far away from the result in this paper.

Yes. Same here!
The Sample data gives a result but obviously not a good result cause it's only sample data.
However, when getting the original RCV1-V2 dataset and using the preprocessed file they provided I got the precision, recall, Micro-f1 and Macro-f1 all zero and loss is nan.
I also tried with the WoS dataset and used their preprocess file and still could not get good results!

Hi, I try this code on another dataset and I have the same problem, I got the precision, recall, Micro-f1, and Macro-f1 all zero, and the loss is nan. Have you solved this problem finally?

你好。我使用您的预处理文件(clean_str 和 clean_stopwors)获取原始 RCV1-V2 数据集,并将其转换为所需的格式 {'token': List[str], 'label': List[str]}。我根据基准数据分割制作了训练、验证和测试数据集。

然后运行helper.hierarchical_tree_statistic.py以获取rcv1-prob.json文件。

然后使用您的分类文件和“gcn-rcv1-v2.json”配置文件(我仅将“hierarchy”:“sample_rcv1.taxonomy”替换为“hierarchy”:“rcv1.taxonomy”),但我没有得到结果。精度、召回率、Micro-f1 和 Macro-f1 均为零,损失为 nan。

在其他数据(不是示例 rcv1 数据集)上运行代码时,我是否需要执行任何其他步骤?

请问一下您是怎么操作的呢,我一直操作报错

Me too. precision 、recall 、f1-score are 0,0,0,

You should using the complete version of dataset. The dataset which now available in this repository isn't complete and for this reason you can not get good out come!