Dataset for spark training.