
text preprocess, word2vec, sentence embedding in text similarity, text classification, Chinese word segmentation, Hidden Markov Model, CRFs, named entity recognition, knowledge graph, dialog system

Primary LanguageOpenEdge ABL


Natural Language Processing related projects, which includes concepts and srcipts about:

DL best practices in NLP

1. Word embeddings

  • Use pre-trained embeddings if available.
  • Embedding dimension is task-dependent
    • Smaller dimensionality (i.e., 100) works well for syntactic tasks (i.e., NER, POS tagging)
    • Larger dimensionality (i.e., 300) is useful for semantic tasks (i.e., sentiment analysis)

2. Depth

  • 3 or 4 layer Bi-LSTMs (e.g. POS tagging, semantic role labelling).
  • 8 encoder and 8 decoder layers (e.g., Google's NMT)
  • In most case, shallower model(i.e., 2 layers) is good enough.

3. Layer connections (for avoiding vanishing gradients)

  • Highway layer
    • h = t * a(WX+b) + (1-t) * X, where t=sigmoid(W_TX+b_T) is called transform gate.
    • Application: language modelling and speech recognition.
    • Implementation: tf.contrib.rnn.HighwayWrapper
  • Residual connection
    • h = a(WX+b) + X
    • Implementation: tf.contrib.rnn.ResidualWrapper
  • Dense connection
    • h_l = a(W[X_1, ..., X_l] + b)
    • Application: multi-task learning

4. Dropout

5. LSTM tricks

  • Treat initial state as variable [2]
# note: if here is LSTMCell, a bug appear: https://stackoverflow.com/questions/42947351/tensorflow-dynamic-rnn-typeerror-tensor-object-is-not-iterable
cell = tf.nn.rnn_cell.GRUCell(state_size)
init_state = tf.get_variable('init_state', [1, state_size], initializer=tf.constant_initializer(0.0))
# https://stackoverflow.com/questions/44486523/should-the-variables-of-the-initial-state-of-a-dynamic-rnn-among-the-inputs-of
init_state = tf.tile(init_state, [batch_size, 1])
  • Gradients clipping
variables = tf.trainable_variables()
gradients = tf.gradients(ys=cost, xs=variables)
clipped_gradients, _ = tf.clip_by_global_norm(gradients, clip_norm=self.clip_norm)
optimizer = tf.train.AdamOptimizer(learning_rate=1e-3)
optimize = optimizer.apply_gradients(grads_and_vars=zip(clipped_gradients, variables), global_step=self.global_step)

6. Attention

  • To do...

[1] http://ruder.io/deep-learning-nlp-best-practices/
[2] https://r2rt.com/recurrent-neural-networks-in-tensorflow-iii-variable-length-sequences.html

Awesome packages

