glorotxa/SME

Model fails to learn

Closed this issue · 3 comments

When I trained TransE on WordNet, only the first epoch updated weights and improved the results.

$ python WN_TransE.py
Using gpu device 0: GeForce GTX 480
DD{'ndim': 20, 'test_all': 10, 'loadmodel': False, 'nhid': 20, 'lremb': 0.01, 'savepath': 'WN_TransE', 'seed': 123, 'marge': 2.0, 'simfn': 'L1', 'neval': 1000, 'dataset': 'WN', 'nbatches': 100, 'lrparam': 1.0, 'loademb': False, 'datapath': '../data/', 'Nrel': 18, 'totepochs': 1000, 'Nent': 40961, 'Nsyn': 40943, 'op': 'TransE'}
/home/minhle/.local/lib/python2.6/site-packages/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`.
  VisibleDeprecationWarning)
/home/minhle/SME/model.py:949: UserWarning: The parameter 'updates' of theano.function() expects an OrderedDict, got <class 'collections.OrderedDict'>. Using a standard dictionary here results in non-deterministic behavior. You should use an OrderedDict if you are using Python 2.7 (theano.compat.python2x.OrderedDict for older python), or use a list of (shared, update) pairs. Do not just convert your dictionary to this type before the call as the conversion will still be non-deterministic.
  updates=updates, on_unused_input='ignore')
BEGIN TRAINING
-- EPOCH 10 (4.1381 seconds per epoch):
COST >> nan +/- nan, % updates: 55.797%
    MEAN RANK >> valid: 21031.55, train: 20901.2955
        ##### NEW BEST VALID >> test: 20779.578
    (the evaluation took 31.919 seconds)
-- EPOCH 20 (4.1195 seconds per epoch):
COST >> nan +/- nan, % updates: 0.0%
    MEAN RANK >> valid: 21031.55, train: 20901.2955
    (the evaluation took 21.398 seconds)
-- EPOCH 30 (4.1711 seconds per epoch):
COST >> nan +/- nan, % updates: 0.0%
    MEAN RANK >> valid: 21031.55, train: 20901.2955
    (the evaluation took 21.084 seconds)
-- EPOCH 40 (4.1701 seconds per epoch):
COST >> nan +/- nan, % updates: 0.0%
    MEAN RANK >> valid: 21031.55, train: 20901.2955
    (the evaluation took 21.182 seconds)
-- EPOCH 50 (4.4218 seconds per epoch):
COST >> nan +/- nan, % updates: 0.0%
    MEAN RANK >> valid: 21031.55, train: 20901.2955
    (the evaluation took 21.132 seconds)
-- EPOCH 60 (4.4139 seconds per epoch):
COST >> nan +/- nan, % updates: 0.0%
    MEAN RANK >> valid: 21031.55, train: 20901.2955
    (the evaluation took 21.03 seconds)

I figured out that at some point the derivative contains NaN

*** NaN detected ***
Elemwise{Composite{(((i0 * i1) / i2) + ((i3 * i4) / i5))}} [@A] ''
 |Elemwise{Identity} [@B] ''
 | |InplaceDimShuffle{x,0} [@C] ''
 |   |Elemwise{gt,no_inplace} [@D] ''
 |     |Elemwise{Sub}[(0, 1)] [@E] ''
 |     | |Elemwise{Add}[(0, 1)] [@F] ''
 |     | | |TensorConstant{(1,) of 2.0} [@G]
 |     | | |Sum{axis=[1], acc_dtype=float64} [@H] ''
 |     | |   |Elemwise{abs_,no_inplace} [@I] ''
 |     | |     |Elemwise{Sub}[(0, 0)] [@J] ''
 |     | |       |Elemwise{Add}[(0, 0)] [@K] ''
 |     | |       | |InplaceDimShuffle{1,0} [@L] ''
 |     | |       | | |SparseDot [@M] ''
 |     | |       | |   |HostFromGpu [@N] ''
 |     | |       | |   | |Eemb [@O]
 |     | |       | |   |SparseVariable{csr,float32} [@P]
 |     | |       | |InplaceDimShuffle{1,0} [@Q] ''
 |     | |       |   |SparseDot [@R] ''
 |     | |       |     |HostFromGpu [@S] ''
 |     | |       |     | |Erelvec [@T]
 |     | |       |     |SparseVariable{csr,float32} [@U]
 |     | |       |InplaceDimShuffle{1,0} [@V] ''
 |     | |         |SparseDot [@W] ''
 |     | |           |HostFromGpu [@N] ''
 |     | |           |SparseVariable{csr,float32} [@X]
 |     | |Sum{axis=[1], acc_dtype=float64} [@Y] ''
 |     |   |Elemwise{abs_,no_inplace} [@Z] ''
 |     |     |Elemwise{Composite{((i0 + i1) - i2)}}[(0, 0)] [@BA] ''
 |     |       |InplaceDimShuffle{1,0} [@BB] ''
 |     |       | |SparseDot [@BC] ''
 |     |       |   |HostFromGpu [@N] ''
 |     |       |   |SparseVariable{csr,float32} [@BD]
 |     |       |InplaceDimShuffle{1,0} [@Q] ''
 |     |       |InplaceDimShuffle{1,0} [@V] ''
 |     |TensorConstant{(1,) of 0} [@BE]
 |InplaceDimShuffle{1,0} [@BF] ''
 | |Elemwise{Composite{((i0 + i1) - i2)}}[(0, 0)] [@BA] ''
 |InplaceDimShuffle{1,0} [@BG] ''
 | |Elemwise{abs_,no_inplace} [@Z] ''
 |Elemwise{Composite{((-i0) + (-i1))}}[(0, 1)] [@BH] ''
 | |Elemwise{Identity} [@B] ''
 | |Elemwise{Identity} [@BI] ''
 |   |InplaceDimShuffle{x,0} [@BJ] ''
 |     |Elemwise{gt,no_inplace} [@BK] ''
 |       |Elemwise{Sub}[(0, 0)] [@BL] ''
 |       | |Elemwise{Add}[(0, 1)] [@F] ''
 |       | |Sum{axis=[1], acc_dtype=float64} [@BM] ''
 |       |   |Elemwise{Abs}[(0, 0)] [@BN] ''
 |       |     |Elemwise{Sub}[(0, 1)] [@BO] ''
 |       |       |Elemwise{Add}[(0, 0)] [@K] ''
 |       |       |InplaceDimShuffle{1,0} [@BP] ''
 |       |         |SparseDot [@BQ] ''
 |       |           |HostFromGpu [@N] ''
 |       |           |SparseVariable{csr,float32} [@BR]
 |       |TensorConstant{(1,) of 0} [@BE]
 |InplaceDimShuffle{1,0} [@BS] ''
 | |Elemwise{Sub}[(0, 0)] [@J] ''
 |InplaceDimShuffle{1,0} [@BT] ''
   |Elemwise{abs_,no_inplace} [@I] ''
Inputs : [array([[ 1.,  1.,  0., ...,  0.,  0.,  1.]], dtype=float32), array([[-0.31204802,  0.16644922, -0.11884879, ..., -0.26266283,
        -0.45505029, -0.2066461 ],
       [-0.13207039, -0.05348344,  0.02279407, ..., -0.22308037,
         0.42979217, -0.55216193],
       [-0.1150318 , -0.38344097, -0.32395732, ...,  0.53119653,
        -0.31519809, -0.07065554],
       ...,
       [-0.16853562, -0.01295686, -0.1453148 , ..., -0.07800217,
         0.24274698, -0.29501894],
       [ 0.51208645,  0.50760674, -0.10291943, ...,  0.30489475,
        -0.20986378,  0.07143602],
       [ 0.29375333, -0.0947492 ,  0.23110297, ..., -0.15360433,
        -0.40362051, -0.60213274]], dtype=float32), array([[ 0.31204802,  0.16644922,  0.11884879, ...,  0.26266283,
         0.45505029,  0.2066461 ],
       [ 0.13207039,  0.05348344,  0.02279407, ...,  0.22308037,
         0.42979217,  0.55216193],
       [ 0.1150318 ,  0.38344097,  0.32395732, ...,  0.53119653,
         0.31519809,  0.07065554],
       ...,
       [ 0.16853562,  0.01295686,  0.1453148 , ...,  0.07800217,
         0.24274698,  0.29501894],
       [ 0.51208645,  0.50760674,  0.10291943, ...,  0.30489475,
         0.20986378,  0.07143602],
       [ 0.29375333,  0.0947492 ,  0.23110297, ...,  0.15360433,
         0.40362051,  0.60213274]], dtype=float32), array([[-2., -1., -1., ..., -1., -0., -1.]], dtype=float32), array([[-0.02479793, -0.09800016, -0.47978631, ...,  0.15308018,
        -0.13509962,  0.0328592 ],
       [ 0.06248942,  0.07622787,  0.07242005, ..., -0.08090827,
         0.08486907, -0.21517622],
       [-0.4214046 ,  0.06561833,  0.11469959, ...,  0.06334117,
         0.16623311, -0.08900161],
       ...,
       [-0.00456831,  0.02458404,  0.01200159, ...,  0.08199899,
        -0.03191963, -0.07951355],
       [-0.04218072, -0.15660736,  0.01521698, ..., -0.18092547,
        -0.00480059, -0.27486905],
       [ 0.21971977, -0.40684754,  0.19962206, ...,  0.1775555 ,
        -0.46583533, -0.53337425]], dtype=float32), array([[ 0.02479793,  0.09800016,  0.47978631, ...,  0.15308018,
         0.13509962,  0.0328592 ],
       [ 0.06248942,  0.07622787,  0.07242005, ...,  0.08090827,
         0.08486907,  0.21517622],
       [ 0.4214046 ,  0.06561833,  0.11469959, ...,  0.06334117,
         0.16623311,  0.08900161],
       ...,
       [ 0.00456831,  0.02458404,  0.01200159, ...,  0.08199899,
         0.03191963,  0.07951355],
       [ 0.04218072,  0.15660736,  0.01521698, ...,  0.18092547,
         0.00480059,  0.27486905],
       [ 0.21971977,  0.40684754,  0.19962206, ...,  0.1775555 ,
         0.46583533,  0.53337425]], dtype=float32)]
Outputs: [array([[ 1.,  2.,  1., ..., -1.,  0., -2.],
       [-3., -2., -1., ...,  1.,  0.,  0.],
       [ 1., -2., -1., ..., -1., -0.,  0.],
       ...,
       [ 1., -2., -1., ..., -1.,  0.,  0.],
       [ 3.,  2., -1., ...,  1.,  0.,  2.],
       [-1.,  0., -1., ..., -1.,  0.,  0.]], dtype=float32)]

I knew that it was the derivative because I printed embedding values before and after updating and normalizing.

After I disabled the Theano flag "floatX=float32", it ran OK.