ttrouill/complex

ConceptNet

robclouth opened this issue · 2 comments

Hi Theo,
Sorry for contacting you through here, but the email address you've given doesn't work. Basically I'm a knowledge graph embedding hobbyist and I'm trying to experiment with your ComplEx code on a 'real' dataset that's based on ConceptNet instead of FB15k. It's much larger than FB15k, and has 1345609 entities, 1995411 triplets but only 36 relations. The number of entities is much larger than FB15k, but there are much fewer relation types.
Complex doesn't seem to be as effective on this dataset. There is some improvement but it's super slow. Could you please offer some insight into how I could improve the results? My guess is that, because there's many entities and not so many triplets, the algorithm doesn't have enough triplets to learn from. Perhaps increasing the negative triplet factor would help.
But training is so slow on my computer (3 days for 50 iterations) that I'd like to hear your input before wasting another 3 days.

Thank you, and keep up the great work!

Hi!

No worries, indeed that's a large number of parameters to learn for not so many data points.
Increasing the number of negative triplets might help, but that's also more computation. I think the main parameter to play with in this case is the embedding size, 200 is probably too much for such a sparse data set. Regularization (lambda value) is probably important there for the same reason.

The problem is, you don't know which are the good values before you try them, that means grid searching on all combinations :/ I have no magic trick to offer sorry, it just require a lot of electricity, as usual. Though 3 days for 50 iterations seems quite long, you should check that your python install is linked against a decent BLAS library. MKL is now free with Anaconda and recommended in theano documentation ( http://deeplearning.net/software/theano/install_ubuntu.html ), OpenBLAS is another good and free alternative,. You can check your BLAS library with this script: https://github.com/Theano/Theano/blob/master/theano/misc/check_blas.py . You can have decent speed ups on GPU if you have one around also.

If you really want a shot in the dark, from what you've told me, I'd say try around embeddings of size 50 and lambda of 0.001.

Good luck!

Thanks for the reply and sorry about not saying anything. I've moved on for now. I'll come back to this eventually though!