This in my Demo of Chen et al. "GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks" ICML 2018
Input: Two synthetic regression tasks according to Ma et al. (KDD 2018) "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts"
Network: One shared layer and two task specific towers
Framework: Pytorch