Embeddings of all nodes are not obtained

Question

Embeddings of all nodes are not obtained

ayushidalmia opened this issue 7 years ago · 16 comments

Hi, I was trying to run this on a graph. However, the embeddings vec_1st.txt, vec_2nd.txt, and vec_all.txt do not generate the embeddings of all the nodes as in the original input graph.

Can you tell where I might be going wrong or why is this behavior caused?

Answer 1 · 2017-06-29T16:26:16.000Z

Same issue with me. Some nodes are missing.

Answer 2 · 2017-07-17T06:41:45.000Z

If you have read the code, you may find that the training instances are sampled from the graph, so the edges of low degree vertices won't be sampled in the training stage. This is the reason that some nodes are missing in the final embedding result.

Answer 3 · 2017-08-29T07:25:35.000Z

@zhujiangang The embeddings are initialized at first in InitVector() so even some edges are not sampled the nodes still have embeddings. I didn't have this issue in my case of using LINE. I wonder what caused your problem.

Answer 4 · 2017-09-15T01:44:28.000Z

@gooeyforms Could you help me use this LINE model, I hava met some problems: I followed the train_youtube command, set the binary parameter 0, but the first column of the result file appear float nums, differ from the origin vertex id. I am very confused.

Answer 5 · 2017-09-15T06:23:38.000Z

I do had problems when I set binary to 0. So currently I'm setting binary to 1, then read the binary file and output to readable text (using Python, which I'm more familiar with). This is just makeshift. Please let me know if you figure this out. 2017年9月15日 09:44，jiay302 <notifications@github.com>写道： @gooeyforms<https://github.com/gooeyforms> Could you help me use this LINE model, I hava met some problems: I followed the train_youtube command, set the binary parameter 0, but the first column of the result file appear float nums, differ from the origin vertex id. I am very confused. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#11 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ADXP2vMsNKYBdVb_kG5mojZ9j1HOcf27ks5sidZ-gaJpZM4NstcM>.

Answer 6 · 2017-09-18T02:13:36.000Z

@gooeyforms Can you provide your email to me? I have some questions to ask you. I am a student at Beijing University of Posts and Telecommunications. I am looking forward to getting your help. Thank you very much.

Answer 7 · 2017-10-17T06:06:56.000Z

Have you solved it? I met the same problem. Thanks!

Answer 8 · 2018-03-13T06:29:00.000Z

I have met the same problem, while I use the data of BlogCatalog, the embedding should be 10312,but line only returns a number of 10263.

Answer 9 · 2018-03-13T06:38:41.000Z

@pickou Could you run the code again with binary -1, and count the lines in the binary embedding file, like using wc -l *.embedding?
I have run LINE on the BlogCatalog dataset with binary -1 and this issue didn't occur. But I'm having trouble with binary -0.

Answer 10 · 2018-03-13T08:25:57.000Z

In my case, the issue occur the same. when I use wc -l line.emb ,I got 27420 and 27501 in two runs of LINE with the same parameters.

Answer 11 · 2018-03-14T04:43:52.000Z

@mongooma have you change the graph as undirected one ? I have made the change, like this.

1 2
3 5

then,

Answer 12 · 2018-03-14T05:23:42.000Z

@pickou I did. I don't know what caused the issue. I suggest you set breakpoints or print lines to debug the code. Please let me know when you locate the problem.

Answer 13 · 2018-03-15T08:56:56.000Z

@mongooma I have found what caused the issue.
when you read the edges from file, you must give a weight.

fscanf(fin, "%s %s %lf", name_v1, name_v2, &weight);

see,

1 2
3 5

then,

You'd better to warn people of that or you can set a parameter, like weighted, and deal with weighted and unweighted graph.

Answer 14 · 2018-03-15T09:19:36.000Z

@pickou I'm glad you located the problem. However, I still don't understand why this would cause the random result with different runs as you described. And even that I think the original input format is explicit enough for all types of graphs, I definitely think a separate script to deal with different input formats is a good idea.
At this point, you could commit a pull request to add a warning line to the Readme file.

Answer 15 · 2018-03-15T09:33:14.000Z

@mongooma I don't know either, but I have followed the ReadData() function, when I use the unweighted graph as input, like

and I print the name_v1 and name_v2,
Sometimes I got "1\100\066" instead of "1". I think the issue came from here.

Answer 16 · 2019-01-16T07:38:39.000Z

I suppose the reason why nodes miss is that the degree of missing nodes is zero