Translate from reactions and reagents to products.

Modify an English-French translation transformer to a chemical reaction prediction model.

to do

Replace the greedy search with beam search
Test the logic of training

2.1. pretraining tasks

2.2. replace the input of decoder with different sequence
(Completed) wrap .ipynb to .py file and generate a config file for convenient use.
(Failed) find a way to train the model in distributed mechines to speed up.
Check the Teacher Foring and apply Teacher Foring Ratio.

5.1 Found Exposure Bias problems: When generating, once the decode_input is not target sequence, model outputs repeated tokens out of control. While target sequence is input as the decode_input, model outputs kinda normal tokens. This is the main problem caused by Teacher Foring strategy. The strategy accelerates training progress but makes generative output, without target sequence, worse!!!