HKUST-KnowComp/ASER

It takes time to fetch related concepts

tkyk853isp opened this issue · 8 comments

I tried to run the following code with KG.db and concept.db from the data of ASER 2.0 in OneDrive (link) loaded.

from aser.client import ASERClient
client = ASERClient(port=8000, port_out=8001)
e1 = client.extract_eventualities("He fears her.")[0][0]
c1 = client.conceptualize_eventuality(e1)[0][0]
print(client.fetch_related_concepts(c1))

Then the line print(client.fetch_related_concepts(c1)) didn't finish for more than 30 minutes. Is this normal? If not, what are the possible causes? I ran it in a Docker container. I used an EC2 instance (OS: Ubuntu 20.04 LTS, vCPU: 48, Memory: 384GiB). Using KG.db and concept.db for the tutorial, it finished instantly.

How about now? I think it would take several hours to load the concept DB of the released version.

Sorry, I didn't mention that I had run

aser-server -n_workers 1 -n_concurrent_back_socks 10 \
    -port 8000 -port_out 8001 \
    -corenlp_path "stanford-corenlp-3.9.2" -base_corenlp_port 9000 \
    -aser_kg_dir "core" -concept_kg_dir "concept" -probase_path "probase.txt"

and seen “Loading Server Finished in xx s” in the log before running the Python code. “Loading Server Finished” implies loading the concept DB finished, doesn't it?

In fact, it does not load the DB completely.
You may refer to L181, it just use the "cache" mode to load DB metadata. Once one query is received in the server part, it read the DB again to check whether "He fears her." is stored in DB (which is quick). However, fetch_related_concepts method requires to load all DB in memory (which is pretty slow and requires a lot of memory).

Isn't the fetch_related_concepts over yet?

Actually, I stopped fetch_related_concepts about 45 minutes after it started because it costs money to use an EC2 instance and I suspected I had made a mistake.
Is there a way to get the related concepts quickly?

There is no general way to do that. You can manually (1) get the concept id, (2) find the connected concept ids, (3) fetch the corresponding concepts by sqlite. Although the implementation of fetch_related_concepts is descripted as above, you may write it via command line or c++ to save memory or speed up.

Another reason is the number of related concepts. If your queried concept has hundreds of thousands connected concepts, then it requires some time to process and wrap.

I see. Thank you for answering my question.