Incorporating Duration and Style Prediction to Deep Neural Network-based Sentence-level Controllable Speech Synthesis