tangjianpku/LINE

Embeddings of all nodes are not obtained

ayushidalmia opened this issue · 16 comments

Hi, I was trying to run this on a graph. However, the embeddings vec_1st.txt, vec_2nd.txt, and vec_all.txt do not generate the embeddings of all the nodes as in the original input graph.

Can you tell where I might be going wrong or why is this behavior caused?

Same issue with me. Some nodes are missing.

If you have read the code, you may find that the training instances are sampled from the graph, so the edges of low degree vertices won't be sampled in the training stage. This is the reason that some nodes are missing in the final embedding result.

@zhujiangang The embeddings are initialized at first in InitVector() so even some edges are not sampled the nodes still have embeddings. I didn't have this issue in my case of using LINE. I wonder what caused your problem.

@gooeyforms Could you help me use this LINE model, I hava met some problems: I followed the train_youtube command, set the binary parameter 0, but the first column of the result file appear float nums, differ from the origin vertex id. I am very confused.

@gooeyforms Can you provide your email to me? I have some questions to ask you. I am a student at Beijing University of Posts and Telecommunications. I am looking forward to getting your help. Thank you very much.

Have you solved it? I met the same problem. Thanks!

I have met the same problem, while I use the data of BlogCatalog, the embedding should be 10312,but line only returns a number of 10263.

@pickou Could you run the code again with binary -1, and count the lines in the binary embedding file, like using wc -l *.embedding?
I have run LINE on the BlogCatalog dataset with binary -1 and this issue didn't occur. But I'm having trouble with binary -0.

In my case, the issue occur the same. when I use wc -l line.emb ,I got 27420 and 27501 in two runs of LINE with the same parameters.

@mongooma have you change the graph as undirected one ? I have made the change, like this.

1 2
3 5

then,

1 2
2 1
3 5
5 3

@pickou I did. I don't know what caused the issue. I suggest you set breakpoints or print lines to debug the code. Please let me know when you locate the problem.

@mongooma I have found what caused the issue.
when you read the edges from file, you must give a weight.

fscanf(fin, "%s %s %lf", name_v1, name_v2, &weight);

see,

1 2
3 5

then,

1 2 1
2 1 1
3 5 1
5 3 1

You'd better to warn people of that or you can set a parameter, like weighted, and deal with weighted and unweighted graph.

@pickou I'm glad you located the problem. However, I still don't understand why this would cause the random result with different runs as you described. And even that I think the original input format is explicit enough for all types of graphs, I definitely think a separate script to deal with different input formats is a good idea.
At this point, you could commit a pull request to add a warning line to the Readme file.

@mongooma I don't know either, but I have followed the ReadData() function, when I use the unweighted graph as input, like

1 2
2 1
3 1
1 3

and I print the name_v1 and name_v2,
Sometimes I got "1\100\066" instead of "1". I think the issue came from here.

I suppose the reason why nodes miss is that the degree of missing nodes is zero