NLP-Projects: An OpenEdge ABL repository from geogreff

NLP-Projects

Natural Language Processing related projects, which includes concepts and srcipts about:

Word2vec: gensim, fastText and tensorflow implementations. See Chinese notes, 中文解读.
Text similarity: gensim doc2vec and gensim word2vec averaging implementations.
Text classification: tensorflow LSTM (See Chinese notes 1, 中文解读 1 and Chinese notes 2, 中文解读 2) and fastText implementations.
Chinese word segmentation: HMM Viterbi implementations. See Chinese notes, 中文解读.
Sequence labeling - NER: brands NER via bi-directional LSTM + CRF, tensorflow implementation. See Chinese notes, 中文解读.
..

DL best practices in NLP

1. Word embeddings

Use pre-trained embeddings if available.
Embedding dimension is task-dependent
- Smaller dimensionality (i.e., 100) works well for syntactic tasks (i.e., NER, POS tagging)
- Larger dimensionality (i.e., 300) is useful for semantic tasks (i.e., sentiment analysis)

2. Depth

3 or 4 layer Bi-LSTMs (e.g. POS tagging, semantic role labelling).
8 encoder and 8 decoder layers (e.g., Google's NMT)
In most case, shallower model(i.e., 2 layers) is good enough.

3. Layer connections (for avoiding vanishing gradients)

Highway layer
- h = t * a(WX+b) + (1-t) * X, where t=sigmoid(W_TX+b_T) is called transform gate.
- Application: language modelling and speech recognition.
- Implementation: tf.contrib.rnn.HighwayWrapper
Residual connection
- h = a(WX+b) + X
- Implementation: tf.contrib.rnn.ResidualWrapper
Dense connection
- h_l = a(W[X_1, ..., X_l] + b)
- Application: multi-task learning

4. Dropout

Batch normalization in CV likes dropout in NLP.
Dropout rate of 0.5 is perferred.
Recurrent dropout (what's the difference between recurrent dropout and traditional dropout ?) applies the same dropout mask across timesteps at layer l. Implementation: tf.contrib.rnn.DropoutWrapper(variational_recurrent=True)

5. LSTM tricks

Treat initial state as variable [2]

# note: if here is LSTMCell, a bug appear: https://stackoverflow.com/questions/42947351/tensorflow-dynamic-rnn-typeerror-tensor-object-is-not-iterable
cell = tf.nn.rnn_cell.GRUCell(state_size)
init_state = tf.get_variable('init_state', [1, state_size], initializer=tf.constant_initializer(0.0))
# https://stackoverflow.com/questions/44486523/should-the-variables-of-the-initial-state-of-a-dynamic-rnn-among-the-inputs-of
init_state = tf.tile(init_state, [batch_size, 1])

Gradients clipping

variables = tf.trainable_variables()
gradients = tf.gradients(ys=cost, xs=variables)
clipped_gradients, _ = tf.clip_by_global_norm(gradients, clip_norm=self.clip_norm)
optimizer = tf.train.AdamOptimizer(learning_rate=1e-3)
optimize = optimizer.apply_gradients(grads_and_vars=zip(clipped_gradients, variables), global_step=self.global_step)

6. Attention

To do...

Reference:
[1] http://ruder.io/deep-learning-nlp-best-practices/
[2] https://r2rt.com/recurrent-neural-networks-in-tensorflow-iii-variable-length-sequences.html

geogreff/NLP-Projects

NLP-Projects

DL best practices in NLP

1. Word embeddings

2. Depth

3. Layer connections (for avoiding vanishing gradients)

4. Dropout

5. LSTM tricks

6. Attention

Awesome packages

Chinese

1. pyltp

2. HanLP

English

1. Spacy

2. gensim