Dru-Mara/EvalNE

The way to test tadw of the openne library is to make the following mistakes!using the simple-example.py

liujinxin1 opened this issue · 8 comments

Hello, I'm testing tadw of openne library with simple-example.py file. There are the following errors. I hope you can help me solve the questions in your busy schedule.

D:\software\Python3.5\python3.exe "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py" --multiproc --qt-support=auto --client 127.0.0.1 --port 5345 --file C:/Users/liujinxin/Desktop/EvalNE-master/examples/simple-example.py
pydev debugger: process 1788 is connecting

Connected to pydev debugger (build 183.5912.18)
Running command...
python3 -m openne --method tadw --input C:/Users/liujinxin/Desktop/xiugai/OpenNE-master/dwata/cora/cora_edgelist.txt --graph-format edgelist --output vec_all.txt --q 0.25 --p 0.25 --input ./edgelist.tmp --output ./emb.tmp --representation-size 128
Reading...
Traceback (most recent call last):
File "D:\software\Python3.5\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "D:\software\Python3.5\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "D:\software\Python3.5\lib\site-packages\openne-0.0.0-py3.5.egg\openne_main
.py", line 182, in
main(parse_args())
File "D:\software\Python3.5\lib\site-packages\openne-0.0.0-py3.5.egg\openne_main
.py", line 137, in main
g.read_node_label(args.label_file)
File "D:\software\Python3.5\lib\site-packages\openne-0.0.0-py3.5.egg\openne\graph.py", line 89, in read_node_label
self.G.nodes[vec[0]]['label'] = vec[1:]
File "D:\software\Python3.5\lib\site-packages\networkx\classes\reportviews.py", line 178, in getitem
return self._nodes[n]
KeyError: '703'
I/O error(2): No such file or directory while evaluating method tadw
Traceback (most recent call last):
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py", line 1741, in
main()
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/liujinxin/Desktop/EvalNE-master/examples/simple-example.py", line 43, in
edge_embedding_methods=edge_emb, input_delim=' ', output_delim=' ')
File "D:\software\Python3.5\lib\site-packages\evalne\evaluation\evaluator.py", line 695, in evaluate_cmd
input_delim, output_delim, write_weights, write_dir, verbose)
File "D:\software\Python3.5\lib\site-packages\evalne\evaluation\evaluator.py", line 744, in _evaluate_ne_cmd
num_vectors = sum(1 for _ in open(tmpemb))
FileNotFoundError: [Errno 2] No such file or directory: './emb.tmp'
Backend TkAgg is interactive backend. Turning interactive mode on.
Failed to enable GUI event loop integration for 'tk'
Traceback (most recent call last):
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydev_ipython\matplotlibtools.py", line 31, in do_enable_gui
enable_gui(guiname)
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydev_ipython\inputhook.py", line 536, in enable_gui
return gui_hook(app)
File "D:\software\pycharm\PyCharm Community Edition 2018.3.5\helpers\pydev\pydev_ipython\inputhook.py", line 285, in enable_tk
app = TK.Tk()
File "D:\software\Python3.5\lib\tkinter_init
.py", line 1877, in init
self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: Can't find a usable init.tcl in the following directories:
D:/software/Python3.5/lib/tcl8.6 D:/software/lib/tcl8.6 D:/lib/tcl8.6 D:/software/library D:/library D:/tcl8.6.4/library D:/tcl8.6.4/library

This probably means that Tcl wasn't installed properly.

Hello,

The error you are encountering does not seem to be related to EvalNE, but rather to the TADW implementation in OpenNE. Please remember that this library is basically an interface to the methods you want to evaluate. Therefore, you need to be able to successfully run these methods directly from the command line before pasting that command line call into EvalNE and running the full LP evaluation pipeline.

With all that said, you can try the following:

  1. make sure that the python3-tk package is installed (sudo apt-get install python3-tk).

  2. in order to prevent some other errors I'd recommend you to go to the __main.py__ of OpenNE and comment line 136 in the TADW section (g.read_node_label(args.label_file)). Then recompile the library.

  3. now, if you take a look at the TADW implementation in OpenNE you will see that it requires an additional file containing node features as input. This file should contain one line per graph node (node_id feature_0 feature_1 ...) with values separated by blanks.

You should then try running on the command line:
python -m openne --method tadw --graph-format edgelist --feature-file ./node_features.txt --input ./network.edgelist --output ./output.txt --representation-size 128

where ./network.edgelist contains an edgelist representation of a graph e.g.
0 1
1 2
1 3

and node_features.txt contains the features asociated with every node in the input network e.g.
0 0 1 1 0
1 0 0 0 1
2 0 1 0 1
3 1 1 1 1

If you can run the previous command without issues, then you can include in the simple_example.py the following command in order to evaluate TADW on LP:
'python -m openne --method tadw --graph-format edgelist --feature-file ./node_features.txt'
The library will automatically fill in for you the input, output and representation_size parameters for you.

Let me know if there are any more issues,
Alex

Thank you very much for answering my question in such detail! I am a student who did not attend school yesterday, so I can not thank you for your reply in time. I will debug the program first according to your suggestion.

Hello,
According to your suggestions, I have revised the following aspects:

  1. make sure that the python3-tk package is installed。
    C:\Users\liujinxin\Desktop\BANE-master\src>python3
    Python 3.5.3 (v3.5.3:1880cb95a742, Jan 16 2017, 16:02:32) [MSC v.1900 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.

import tkinter as tk

2.run the TADW without issues in OpenNE
C:\Users\liujinxin\Desktop\EvalNE-master\examples>python3 -m openne --method tadw --label-file data/cora/cora_labels.txt --input data/cora/cora_edgelist.txt --graph-format edgelist --feature-file data/cora/cora.features --output vec_all.txt --clf-ratio 0.5
Reading...
Iteration 0
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
Iteration 6
Iteration 7
Iteration 8
Iteration 9
Iteration 10
Iteration 11
Iteration 12
Iteration 13
Iteration 14
Iteration 15
Iteration 16
Iteration 17
Iteration 18
Iteration 19
50.68146753311157
Saving embeddings...
Training classifier using 50.00% nodes...
{'samples': 0.8515509601181684, 'macro': 0.8398027976481857, 'micro': 0.8515509601181683, 'weighted': 0.8507733062659176

I think both of the above are complete. When running the simple.py file, the input is changed to
G = pp.load_graph (r'C: Users liujinxin Desktop EvalNE-master examples data\ cora edgelist. txt'), and the following error still occurs when executing again.

Running command...
python3 -m openne --method tadw --graph-format edgelist --feature-file data/cora/cora.features --input ./edgelist.tmp --output ./emb.tmp --representation-size 128
Reading...
Traceback (most recent call last):
File "D:\software\Python3.5\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "D:\software\Python3.5\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "D:\software\Python3.5\lib\site-packages\openne-0.0.0-py3.5.egg\openne_main
.py", line 185, in
main(parse_args())
File "D:\software\Python3.5\lib\site-packages\openne-0.0.0-py3.5.egg\openne_main
.py", line 141, in main
g.read_node_features(args.feature_file)
File "D:\software\Python3.5\lib\site-packages\openne-0.0.0-py3.5.egg\openne\graph.py", line 97, in read_node_features
[float(x) for x in vec[1:]])
File "D:\software\Python3.5\lib\site-packages\networkx\classes\reportviews.py", line 178, in getitem
return self._nodes[n]
KeyError: '2485'
I/O error(2): No such file or directory while evaluating method tadw
Traceback (most recent call last):
File "C:/Users/liujinxin/Desktop/EvalNE-master/examples/simple-example.py", line 44, in
edge_embedding_methods=edge_emb, input_delim=' ', output_delim=' ')
File "D:\software\Python3.5\lib\site-packages\evalne\evaluation\evaluator.py", line 695, in evaluate_cmd
input_delim, output_delim, write_weights, write_dir, verbose)
File "D:\software\Python3.5\lib\site-packages\evalne\evaluation\evaluator.py", line 744, in _evaluate_ne_cmd
num_vectors = sum(1 for _ in open(tmpemb))
FileNotFoundError: [Errno 2] No such file or directory: './emb.tmp'

Process finished with exit code 1

Hi,

Steps 1 and 2 are looking good :)

About the error you are getting now, it's due to the fact that the library by default relabels the nodes in the input graph to sequential integers 0...N (the reason for this is that some implementations of NE methods expect nodes to be sequential integers which can be used index rows in a matrix).
So, the nodes in the input graph are relabeled to sequential integers (now there is no node with key 2485), but the elements in the feature file data/cora/cora.features still have the old node IDs.
In order to solve this, you can modify line 14 in the simple_example.py to:

G, _ = pp.prep_graph(G, relabel=False)

The simple solution above should work just fine, however, another option would be to change the keys (or node IDs) in the data/cora/cora.features file to match those assigned to the nodes of the graph by EvalNE. You can modify the simple_example.py and do something like:

G, ids = pp.prep_graph(G)
# ids contains a list of (oldNodeID, newNodeID)
# make it a dictionary
d = dict(ids)
newfeat = list()
for elem in read cora.features file:

  • line = elem.split(' ')
  • newfeat = [d[line[0]], line[1:]]
  • # write the newfeat to a file and give this as --feature-file.

I hope this helps and let me know if you need anything else :)

Alex

Hello,
These are the problems caused by my poor programming ability. I really appreciate your patience in responding to me. I am really touched.
I have revised it in two ways as you said. The results are as follows. One of the most important problems I have found is that the content of the cora. features file is not in the. TXT format you used in your first reply to me, and its first column is not 0,1,2...n, like this
0 0 1 1 0
1 0 0 0 1
2 0 1 0 1
3 1 1 1 1

Traceback (most recent call last):
File "D:\software\Python3.5\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "D:\software\Python3.5\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "D:\software\Python3.5\lib\site-packages\openne-0.0.0-py3.5.egg\openne_main
.py", line 185, in
main(parse_args())
File "D:\software\Python3.5\lib\site-packages\openne-0.0.0-py3.5.egg\openne_main
.py", line 141, in main
g.read_node_features(args.feature_file)
File "D:\software\Python3.5\lib\site-packages\openne-0.0.0-py3.5.egg\openne\graph.py", line 97, in read_node_features
[float(x) for x in vec[1:]])
File "D:\software\Python3.5\lib\site-packages\networkx\classes\reportviews.py", line 178, in getitem
return self._nodes[n]
KeyError: '61'

D:\software\Python3.5\python3.exe C:/Users/liujinxin/Desktop/EvalNE-master/examples/simple-example.py
Traceback (most recent call last):
File "C:/Users/liujinxin/Desktop/EvalNE-master/examples/simple-example.py", line 21, in
newfeat = [d[line[0]], line[1:]]
KeyError: '0'

Hi,

I just downloaded the original cora dataset, from this page and successfully evaluated tadw for LP on it. Here are the steps you need to follow in order to get it running as well:

  1. Go to EvalNE/evalne/evaluation/score.py and change line 65 from:

    if ((train_pred == 0) | (train_pred == 1)).all():

    to:

    if ((train_pred == 0) | (train_pred == 1)).all() and ((test_pred == 0) | (test_pred == 1)).all():

  2. Run the following code to modify the cora features file to a suitable format for OpenNE to use. Remember to update the paths befre running the code and make sure the output cora.features is correct:

from evalne.evaluation import evaluator
from evalne.preprocessing import preprocess as pp

# Load and preprocess the network
G = pp.load_graph('/home/user/Downloads/cora/cora.cites', delimiter='\t', directed=True)
print len(G.nodes)
# The proprocessed G is rectricted to the main connected component,
# thus some nodes in the original G might have been removed
G, ids = pp.prep_graph(G)	
print len(G.nodes)

# Load and preprocess the feature file
d = dict(ids)
newfile = open('/home/user/Downloads/cora/cora.features', 'a+')
for line in open('/home/user/Downloads/cora/cora.content'):
    l = line.split()
    ll = map(int, l[:-1]) 	# remove last element which is string and map the rest to int
    if ll[0] in d.keys():	# we only keep the features of those nodes in the main con. comp.
        s = str(d[ll[0]]) + ' ' + str(ll[1:]).strip('[]') + '\n'
        newfile.write(s.replace(',', ''))
newfile.close()
  1. Make the following changes to simple_example.py and run it:
G = pp.load_graph('/home/user/Downloads/cora/cora.cites', delimiter='\t', directed=True)
# Set embedding methods from OpenNE
methods = ['tadw', 'deepwalk']
commands = [ 'python -m openne --method tadw --graph-format edgelist --feature-file /home/user/Downloads/cora/cora.features',
        'python -m openne --method deepWalk --graph-format edgelist --number-walks 40']
    edge_emb = ['average', 'hadamard']

Notes: Step 1 is a bug in the library which I'll fix in the upcoming version of EvalNE. The code in step 2 is not the nicest but gets the job done. Also, that code is Python2, if you want to run the script with Python3 you will need to encode the input before writing it to a file (newfile.write(s.replace(',', '').encode())) and change the print calls to print(...). Step 3 shows the minor changes needed for correctly evaluating the cora dataset which is directed and contains tabs as delimiters between values.

Alex

Thank you very much for your patience in answering my questions. Your help really gives me great encouragement and confidence.

No problem! Have fun playing around with the code, and if you want, you can give the repo a star :)