ofirnachum/tree_rnn

Is the Dependency (child sum tree) option working?

Opened this issue · 19 comments

Hi, thanks for the great port.

I tried python sentiment.py after getting the data. When I set DEPENDENCY == True I get an assertion error.

tree_rnn $ python sentiment.py
Traceback (most recent call last):
  File "sentiment.py", line 117, in <module>
    train()
  File "sentiment.py", line 53, in train
    vocab, data = data_utils.read_sentiment_dataset(DIR, FINE_GRAINED, DEPENDENCY)
  File "~/tree_rnn/data_utils.py", line 26, in read_sentiment_dataset
    os.path.join(sub_dir, 'dlabels.txt'))
  File "~/tree_rnn/data_utils.py", line 106, in read_trees
    trees.append(read_tree(cur_parents, cur_labels))
  File "~/tree_rnn/data_utils.py", line 128, in read_tree
    assert len(nodes[parent].children) < 2
AssertionError```

The dependency RNN can have a variable number of children so should this check be turned off?

You may be right. I haven't tried the DEPENDENCY = True option on the sentiment demo, but from looking at the dparents.txt file, there are indeed nodes with > 2 children.

Does the demo seem to work properly when you turn this check off?

I tried DEPENDENCY == True with FINE_GRAINED == False and I turned off the relevant asserts in data_utils and tree_rnn and I got the following error:

tree_rnn $ python sentiment.py
train 6920
dev 0
test 0
num emb 21701
num labels 3
epoch 0
Traceback (most recent call last):
  File "sentiment.py", line 117, in <module>
    train()
  File "sentiment.py", line 85, in train
    avg_loss = train_dataset(model, train_set)
  File "sentiment.py", line 100, in train_dataset
    loss, pred_y = model.train_step(tree, None)  # labels will be determined by model
  File "sentiment.py", line 35, in train_step
    with_labels=True)
  File "/Users/apewu/smartannotations/tree_rnn/tree_rnn.py", line 99, in gen_nn_inputs
    np.array(tree, dtype='int32'),
ValueError: setting an array element with a sequence.

Probably because the array is not rectangular, e.g.:

In [4]: np.array([[1,2], [1,2,3]], np.int32)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-2243c44de53d> in <module>()
----> 1 np.array([[1,2], [1,2,3]], np.int32)

ValueError: setting an array element with a sequence.

Ok - there's probably several things I need to alter to get it to work. I'll try to get around to it this week. Thank you for raising this issue.

Hey - I just pushed a commit that I think should fix it. I don't have access to a high-performance machine so I only did some basic checks.

Can you try pulling and let me know if it works for you now?

Seems to be working but memory increases steadily after every epoch. The following output dies after 14 epochs.

ubuntu@ip-172-30-0-234:~/tree_rnn$ python sentiment.py
train 6920
dev 872
test 1821
max degree 21
num emb 21701
num labels 3
WARNING (theano.tensor.blas): We did not found a dynamic library into the library_dir of the library we use for blas. If you use ATLAS, make sure to compile it with dynamics library.
epoch 0
avg loss 9.69434 example 6919 of 69200B^[[B^[[B^[[B^[[B^[[B^[[B^[[B^[[B^[[B^[[B^[[B loss 10.13 at example 2549 of 6920
dev score 0.490825688073
epoch 1
avg loss 9.48838 example 6919 of 69200
dev score 0.5
epoch 2
avg loss 9.38896 example 6919 of 6920
dev score 0.510321100917
epoch 3
avg loss 9.3072t example 6919 of 6920
dev score 0.519495412844
epoch 4
avg loss 9.23145 example 6919 of 6920
dev score 0.534403669725
epoch 5
avg loss 9.15691 example 6919 of 6920
dev score 0.552752293578
epoch 6
avg loss 9.08182 example 6919 of 6920
dev score 0.584862385321
epoch 7
avg loss 9.00572 example 6919 of 6920
dev score 0.612385321101
epoch 8
avg loss 8.92822 example 6919 of 6920
dev score 0.619266055046
epoch 9
avg loss 8.84882 example 6919 of 6920
dev score 0.633027522936
epoch 10
avg loss 8.7674t example 6919 of 6920
dev score 0.645642201835
epoch 11
avg loss 8.68374 example 6919 of 6920
dev score 0.650229357798
epoch 12
avg loss 8.59839 example 6919 of 6920
dev score 0.645642201835
epoch 13
avg loss 8.5115t example 6919 of 6920
dev score 0.658256880734
epoch 14
avg loss 8.79 at eKilled 2749 of 6920

Yes, I notices the same problem of memory blow up!

Interesting...

Does the memory blowup happen only on DEPENDENCY = True?

If it also happens on DEPENDENCY = False, did it also happen before my latest commit?

I checked the original code, it didn't have the problem of memory blow-up.
Btw, I was actually wrote my own extension for the DEPENDENCY = True and I got the same problem of memory blow up (if this helps discussion). I'll keep on tracking down the problem.

It seems has to do with your computation graph, the more complicated it is, the more likely it will blow up the memory....But I haven't figured out why. Anybody has any thoughts?

I don't know what the cause is, but I presume it is a memory leak in Theano. I was however able to confirm that the memory blowup occurs only with DEPENDENCY = True. With a big enough machine (m3.xlarge), I was able to run the demo to completion - output pasted below in case you're interested.

I have little insight into the underlying issue, so I won't put much effort into fixing it. Hopefully future versions of Theano will patch whatever the issue is.

train 6920
dev 872
test 1821
max degree 21
num emb 21701
num labels 3
modprobe: ERROR: could not insert 'nvidia': No such device
epoch 0
avg loss 9.69434 example 6919 of 69200
dev score 0.490825688073
epoch 1
avg loss 9.48838 example 6919 of 69200
dev score 0.5
epoch 2
avg loss 9.38896 example 6919 of 6920
dev score 0.510321100917
epoch 3
avg loss 9.3072t example 6919 of 6920
dev score 0.519495412844
epoch 4
avg loss 9.23145 example 6919 of 6920
dev score 0.534403669725
epoch 5
avg loss 9.15691 example 6919 of 6920
dev score 0.552752293578
epoch 6
avg loss 9.08182 example 6919 of 6920
dev score 0.584862385321
epoch 7
avg loss 9.00572 example 6919 of 6920
dev score 0.612385321101
epoch 8
avg loss 8.92822 example 6919 of 6920
dev score 0.619266055046
epoch 9
avg loss 8.84882 example 6919 of 6920
dev score 0.633027522936
epoch 10                                                                                                                                                                      
avg loss 8.76739 example 6919 of 6920
dev score 0.645642201835
epoch 11
avg loss 8.68374 example 6919 of 6920
dev score 0.650229357798
epoch 12
avg loss 8.59839 example 6919 of 6920
dev score 0.645642201835
epoch 13
avg loss 8.5115t example 6919 of 6920
dev score 0.658256880734
epoch 14
avg loss 8.42351 example 6919 of 6920
dev score 0.662844036697
epoch 15
avg loss 8.33502 example 6919 of 6920
dev score 0.667431192661
epoch 16                                                                                                                                                                      
avg loss 8.2463t example 6919 of 6920
dev score 0.667431192661
epoch 17
avg loss 8.15816 example 6919 of 6920
dev score 0.670871559633
epoch 18
avg loss 8.07026 example 6919 of 6920
dev score 0.673165137615
epoch 19
avg loss 7.98328 example 6919 of 6920
dev score 0.669724770642
epoch 20
avg loss 7.89705 example 6919 of 6920
dev score 0.670871559633
epoch 21
avg loss 7.81348 example 6919 of 6920
dev score 0.66628440367
epoch 22
avg loss 7.72978 example 6919 of 6920
dev score 0.667431192661
epoch 23
avg loss 7.64908 example 6919 of 6920
dev score 0.669724770642
epoch 24
avg loss 7.56896 example 6919 of 6920
dev score 0.668577981651
epoch 25
avg loss 7.49253 example 6919 of 6920
dev score 0.674311926606
epoch 26
avg loss 7.41763 example 6919 of 6920
dev score 0.669724770642
epoch 27
avg loss 7.34301 example 6919 of 6920
dev score 0.668577981651
epoch 28
avg loss 7.27389 example 6919 of 6920
dev score 0.668577981651
epoch 29
avg loss 7.20361 example 6919 of 6920
dev score 0.670871559633
finished training
test score 0.67215815486

Hello. Thanks for sharing such a nice implementation.
I'm a novice engineer and now trying to run your code with sentiment treebank dataset.
However, it produces an error when setting "labels_on_nonroot_nodes" to False.
(I intended to test the performance when the trees are trained only with root labels.)
I tried this with TreeRNN model because default ChildSumTreeLSTM model seems to have no override method for the "labels_on_nonroot_nodes" flag.
I guess this error comes from input data (I mean, the internal nodes' labels) but cannot clearly figure out what is the real cause of this problem.
Can you give me some guidelines (or just a simple advice) of using that flag?
I look forward to hearing from you.

The sentiment dataset is not setup to allow for labels_on_nonroot_nodes=False.
If you want to set it differently, you'll have to pass the data to the model in a significantly different way (see https://github.com/ofirnachum/tree_rnn/blob/master/tree_rnn.py#L207).

Thank you for the reply.
I considered the part you referenced, however, I'm still confused.
When the "labels_on_nonroot_nodes" flag is set to false, the loss is computed only on the root node with its direct children nodes and used to update parameters (e.g. weight matrix and bias). In other words, the upper node's derivative is not propagated to the descendant nodes beyond children nodes, right?
(If then, I think tree_rnn is different from an ordinary RecNN where the root node's error is propagated to the children nodes recursively.)
Sorry for bothering you again, but I wonder my understanding on the parameter update mechanism is correct.

The root node's "value" is determined by some function of its children, which are in turn determined by some function of their children, and so on. Thus, the gradient will propagate to all parameters.

You're right. Actually, I enabled the "labels_on_nonroot_label=False" in a tricky way. I used all the settings/methods for "labels_on_nonroot_label=True" except the "y_exists" list: I set 0 to all the elements in the list while setting the last one, which corresponding to the root node, to 1.
Anyway, your thoughtful advice helps me to understand the tree-structured RNN/LSTM. Thanks for your great help. :)

Hi, thanks for the awesome codes!However, I am a little confused on how to building a Dependency tree. In line 174 of data_utils.py (i.e., function :_remap_tokens_and_labels), I find that Not all the words in the sentence corresponds to a leaf_node, because the node.val of the nodes which have children are set to None.
As a result, the outcoming Dependency tree looks different from the one built by 'http://nlp.stanford.edu:8080/sentiment/rntnDemo.html' w.r.t. a same sentence.
I wish someone could give me some explanation~~~Thanks a lot!

@ysjakking
Recursive Neural Tensor Network, which you linked above, uses a sentence's constituency tree, NOT the dependency tree. So the tree structures look different.
Check this wikipedia article (https://en.wikipedia.org/wiki/Parse_tree) to figure out the difference between constituency parse tree and dependency parse tree, and consider to set the dependency flag to "False" if you want to use the constituency tree-based scheme.

Hello. Thanks for sharing such a nice code.
I meet the same problem of memory blow up! with the help of your code, I implement a simple reranker to score the dependency tree. and this problem has troubled me for a long time.I checked the code for many times but I can't find the error in the code. and it seems like that some error in theano . after meet the issue I make sure that there is some error in theano.