some questions about the kg extraction process
cd86254081 opened this issue · 2 comments
I followed the process of kb4rec and extracted a subgraph of movie domain for movielens-1m, which considers only the most important 10 relations. However, for a dataset that only contains 3000+ items. The entity number in my extracted first-order subgraph reached 100000+. I found that in most papers, for dataset movielens their number of entity in subgraph is realtively small, maybe about 20000.
Could you please share how you treat the subgraph so that the entity number in the subgraph is small.
Hi, based on my experience, 10-core (or 5-core) setting is usually adopted to guarantee the data quality, i.e., retaining entities with at least ten triples. Moreover, when extracting the KG entities from the original KG, you can only consider at most k-hop (say 2-hop) neighbors of seed items, i.e., retaining entities with at most two hops from the seed items of MovieLen.
Hope it can help you.
it helps a lot,thank you so much.