/NLP-Projects

text preprocess, word2vec, sentence embedding in text similarity, text classification, Chinese word segmentation, Hidden Markov Model, CRFs, named entity recognition, knowledge graph, dialog system

Primary LanguageOpenEdge ABL

NLP-Projects

Natural Language Processing related projects, which includes concepts and srcipts about:

DL best practices in NLP

1. Word embeddings

  • Use pre-trained embeddings if available.
  • Embedding dimension is task-dependent
    • Smaller dimensionality (i.e., 100) works well for syntactic tasks (i.e., NER, POS tagging)
    • Larger dimensionality (i.e., 300) is useful for semantic tasks (i.e., sentiment analysis)

2. Depth

  • 3 or 4 layer Bi-LSTMs (e.g. POS tagging, semantic role labelling).
  • 8 encoder and 8 decoder layers (e.g., Google's NMT)
  • In most case, shallower model(i.e., 2 layers) is good enough.

3. Layer connections (for avoiding vanishing gradients)

  • Highway layer
    • h = t * a(WX+b) + (1-t) * X, where t=sigmoid(W_TX+b_T) is called transform gate.
    • Application: language modelling and speech recognition.
    • Implementation: tf.contrib.rnn.HighwayWrapper
  • Residual connection
    • h = a(WX+b) + X
    • Implementation: tf.contrib.rnn.ResidualWrapper
  • Dense connection
    • h_l = a(W[X_1, ..., X_l] + b)
    • Application: multi-task learning

4. Dropout

5. LSTM tricks

  • Treat initial state as variable [2]
# note: if here is LSTMCell, a bug appear: https://stackoverflow.com/questions/42947351/tensorflow-dynamic-rnn-typeerror-tensor-object-is-not-iterable
cell = tf.nn.rnn_cell.GRUCell(state_size)
init_state = tf.get_variable('init_state', [1, state_size], initializer=tf.constant_initializer(0.0))
# https://stackoverflow.com/questions/44486523/should-the-variables-of-the-initial-state-of-a-dynamic-rnn-among-the-inputs-of
init_state = tf.tile(init_state, [batch_size, 1])
  • Gradients clipping
variables = tf.trainable_variables()
gradients = tf.gradients(ys=cost, xs=variables)
clipped_gradients, _ = tf.clip_by_global_norm(gradients, clip_norm=self.clip_norm)
optimizer = tf.train.AdamOptimizer(learning_rate=1e-3)
optimize = optimizer.apply_gradients(grads_and_vars=zip(clipped_gradients, variables), global_step=self.global_step)

6. Attention

  • To do...

Reference:
[1] http://ruder.io/deep-learning-nlp-best-practices/
[2] https://r2rt.com/recurrent-neural-networks-in-tensorflow-iii-variable-length-sequences.html

Awesome packages

Chinese

English