HXX97/GMT-KBQA

Cannot reproduce the results on CWQ

pengyxhack opened this issue · 13 comments

Hi, I read your paper, and it is a great job! But I can't reproduce the result F1=77 on the CWQ dataset. I am using the provided data along with the model CWQ_GMT_KBQA. I just ran the last step: evaluation: python3 eval_topk_prediction_final.py --split test --pred_file exps/CWQ_GMT_KBQA/beam_50_test_4_top_k_predictions.json --test_batch_size 4 --dataset CWQ --beam_size 50, and I got Total: 3531, ACC:0.6978193146417445, AVGP: 0.7499605933112178, AVGR: 0.7684635982032751, AVGF: 0.7445226124529617. Could you please tell me what might be the problem?

Hi, for your concerns

  1. I just re-run the last step using the same command, and our checkpoint uploaded to Google Drive. The screenshot of log is as follows, and the result is F1=77.

截屏2023-01-05 上午7 15 21

截屏2023-01-05 上午7 21 36

  1. Possible problems I can think of:
    • Please check if everything on the checkpoint folder is unzipped into the right place. (e.g., the candidate_entity_map: CWQ_test_4_beam_50_candidate_entity_map.json, which is the output of our multi-task model)
    • During evaluation, is there any network interruption? (maybe several SPARQLs cannot be successfully executed)
    • Freebase version, I think we use the 2015 dump.

Hope this helps.

Thanks for reply! I doubt it is a problem with Freebase. Did you download the Freebase dump from here: Our processed Virtuoso DB file can be downloaded from [here](https://www.dropbox.com/s/q38g0fwx1a3lz8q/virtuoso_db.zip) or via wget (WARNING: 53G+ disk space is needed)(https://github.com/dki-lab/Freebase-Setup)

Hi, sorry for the ambiguity about the Freebase dump in our README.

We use the official dump, I think you can download the dump at https://developers.google.com/freebase.

We will make it clear in our README, and hope this could help.

Hi, sorry for the ambiguity about the Freebase dump in our README.

We use the official dump, I think you can download the dump at https://developers.google.com/freebase.

We will make it clear in our README, and hope this could help.

Ok, I will try this dump. I'll get back to you when I get new results. Thank you.

You're welcome.

Hello, I've downloaded the freebase-rdf-latest.gz file from https://developers.google.com/freebase, and I decompressed it. Could you please tell me what should I do next? It seems that it is not a .db file as described in https://developers.google.com/freebase.

HXX97 commented

Hello, I've downloaded the freebase-rdf-latest.gz file from https://developers.google.com/freebase, and I decompressed it. Could you please tell me what should I do next? It seems that it is not a .db file as described in https://developers.google.com/freebase.

The .db file is the data file of virtuoso. After downloading the .gz file, you should load it into your virtuoso.
Here is a setup tutorial, FYI.
https://github.com/sameersingh/nlp_serde/wiki/Virtuoso-Freebase-Setup

Hello, I've downloaded the freebase-rdf-latest.gz file from https://developers.google.com/freebase, and I decompressed it. Could you please tell me what should I do next? It seems that it is not a .db file as described in https://developers.google.com/freebase.

The .db file is the data file of virtuoso. After downloading the .gz file, you should load it into your virtuoso. Here is a setup tutorial, FYI. https://github.com/sameersingh/nlp_serde/wiki/Virtuoso-Freebase-Setup

Sorry to bother you, but when I tried to start the virtuoso using ../bin/virtuoso-t -df, there is a problem: ERROR: Failed HTTP listen at 8890.. I thought it was a problem of port occupation, but when I checked the port using netstat -tunlp | grep 8890, I didn't find the process:
tcp 6 0 127.0.0.1:8890 0.0.0.0:* LISTEN -
Could you please tell me how to solve this problem? Thank you very much.

HXX97 commented

Hello, I've downloaded the freebase-rdf-latest.gz file from https://developers.google.com/freebase, and I decompressed it. Could you please tell me what should I do next? It seems that it is not a .db file as described in https://developers.google.com/freebase.

The .db file is the data file of virtuoso. After downloading the .gz file, you should load it into your virtuoso. Here is a setup tutorial, FYI. https://github.com/sameersingh/nlp_serde/wiki/Virtuoso-Freebase-Setup

Sorry to bother you, but when I tried to start the virtuoso using ../bin/virtuoso-t -df, there is a problem: ERROR: Failed HTTP listen at 8890.. I thought it was a problem of port occupation, but when I checked the port using netstat -tunlp | grep 8890, I didn't find the process: tcp 6 0 127.0.0.1:8890 0.0.0.0:* LISTEN - Could you please tell me how to solve this problem? Thank you very much.

It seems that the port 8890 is occupied by your previously started virtuoso. You can use ps -ef | grep 'virtuoso' to find the process id and stop it. Or you can change the setting file of virtuoso to start on another port. FYI: https://blog.csdn.net/lft_happiness/article/details/124469414

Hello, I've downloaded the freebase-rdf-latest.gz file from https://developers.google.com/freebase, and I decompressed it. Could you please tell me what should I do next? It seems that it is not a .db file as described in https://developers.google.com/freebase.

The .db file is the data file of virtuoso. After downloading the .gz file, you should load it into your virtuoso. Here is a setup tutorial, FYI. https://github.com/sameersingh/nlp_serde/wiki/Virtuoso-Freebase-Setup

Sorry to bother you, but when I tried to start the virtuoso using ../bin/virtuoso-t -df, there is a problem: ERROR: Failed HTTP listen at 8890.. I thought it was a problem of port occupation, but when I checked the port using netstat -tunlp | grep 8890, I didn't find the process: tcp 6 0 127.0.0.1:8890 0.0.0.0:* LISTEN - Could you please tell me how to solve this problem? Thank you very much.

It seems that the port 8890 is occupied by your previously started virtuoso. You can use ps -ef | grep 'virtuoso' to find the process id and stop it. Or you can change the setting file of virtuoso to start on another port. FYI: https://blog.csdn.net/lft_happiness/article/details/124469414

Thanks for your kind reply. I have solve this problem by changing the port. I am trying to load the rdf into my virtuoso using SQL>ld_dir('.', 'FilterFreebase', 'http://freebase.com'); rdf_loader_run();. By the way, how long will this step take? This seems to take quite a while.

HXX97 commented

Hello, I've downloaded the freebase-rdf-latest.gz file from https://developers.google.com/freebase, and I decompressed it. Could you please tell me what should I do next? It seems that it is not a .db file as described in https://developers.google.com/freebase.

The .db file is the data file of virtuoso. After downloading the .gz file, you should load it into your virtuoso. Here is a setup tutorial, FYI. https://github.com/sameersingh/nlp_serde/wiki/Virtuoso-Freebase-Setup

Sorry to bother you, but when I tried to start the virtuoso using ../bin/virtuoso-t -df, there is a problem: ERROR: Failed HTTP listen at 8890.. I thought it was a problem of port occupation, but when I checked the port using netstat -tunlp | grep 8890, I didn't find the process: tcp 6 0 127.0.0.1:8890 0.0.0.0:* LISTEN - Could you please tell me how to solve this problem? Thank you very much.

It seems that the port 8890 is occupied by your previously started virtuoso. You can use ps -ef | grep 'virtuoso' to find the process id and stop it. Or you can change the setting file of virtuoso to start on another port. FYI: https://blog.csdn.net/lft_happiness/article/details/124469414

Thanks for your kind reply. I have solve this problem by changing the port. I am trying to load the rdf into my virtuoso using SQL>ld_dir('.', 'FilterFreebase', 'http://freebase.com'); rdf_loader_run();. By the way, how long will this step take? This seems to take quite a while.

It depends on your physical machine. Usually it takes about several days.

Hello, I've downloaded the freebase-rdf-latest.gz file from https://developers.google.com/freebase, and I decompressed it. Could you please tell me what should I do next? It seems that it is not a .db file as described in https://developers.google.com/freebase.

The .db file is the data file of virtuoso. After downloading the .gz file, you should load it into your virtuoso. Here is a setup tutorial, FYI. https://github.com/sameersingh/nlp_serde/wiki/Virtuoso-Freebase-Setup

Sorry to bother you, but when I tried to start the virtuoso using ../bin/virtuoso-t -df, there is a problem: ERROR: Failed HTTP listen at 8890.. I thought it was a problem of port occupation, but when I checked the port using netstat -tunlp | grep 8890, I didn't find the process: tcp 6 0 127.0.0.1:8890 0.0.0.0:* LISTEN - Could you please tell me how to solve this problem? Thank you very much.

It seems that the port 8890 is occupied by your previously started virtuoso. You can use ps -ef | grep 'virtuoso' to find the process id and stop it. Or you can change the setting file of virtuoso to start on another port. FYI: https://blog.csdn.net/lft_happiness/article/details/124469414

Thanks for your kind reply. I have solve this problem by changing the port. I am trying to load the rdf into my virtuoso using SQL>ld_dir('.', 'FilterFreebase', 'http://freebase.com'); rdf_loader_run();. By the way, how long will this step take? This seems to take quite a while.

It depends on your physical machine. Usually it takes about several days.

Bad news... the loading process was interrupted. Error 08S01: [Virtuoso Driver]CL065: Lost connection to server at line 11 of Top-Level: rdf_loader_run(). What could be the problem? Could you please help me?

HXX97 commented

Hello, I've downloaded the freebase-rdf-latest.gz file from https://developers.google.com/freebase, and I decompressed it. Could you please tell me what should I do next? It seems that it is not a .db file as described in https://developers.google.com/freebase.

The .db file is the data file of virtuoso. After downloading the .gz file, you should load it into your virtuoso. Here is a setup tutorial, FYI. https://github.com/sameersingh/nlp_serde/wiki/Virtuoso-Freebase-Setup

Sorry to bother you, but when I tried to start the virtuoso using ../bin/virtuoso-t -df, there is a problem: ERROR: Failed HTTP listen at 8890.. I thought it was a problem of port occupation, but when I checked the port using netstat -tunlp | grep 8890, I didn't find the process: tcp 6 0 127.0.0.1:8890 0.0.0.0:* LISTEN - Could you please tell me how to solve this problem? Thank you very much.

It seems that the port 8890 is occupied by your previously started virtuoso. You can use ps -ef | grep 'virtuoso' to find the process id and stop it. Or you can change the setting file of virtuoso to start on another port. FYI: https://blog.csdn.net/lft_happiness/article/details/124469414

Thanks for your kind reply. I have solve this problem by changing the port. I am trying to load the rdf into my virtuoso using SQL>ld_dir('.', 'FilterFreebase', 'http://freebase.com'); rdf_loader_run();. By the way, how long will this step take? This seems to take quite a while.

It depends on your physical machine. Usually it takes about several days.

Bad news... the loading process was interrupted. Error 08S01: [Virtuoso Driver]CL065: Lost connection to server at line 11 of Top-Level: rdf_loader_run(). What could be the problem? Could you please help me?

It's weird, I never met this problem. Maybe you can try deploying it on another machine. Make sure that you have enough RAM(>=128GB) and ROM(>=500GB).