This is an implementation of a CVAE in Keras trained on the MNIST data set, based on the paper Learning Structured Output Representation using Deep Conditional Generative Models and the code fragments from Agustinus Kristiadi's blog here.
Since a one-hot vector of digit class labels is concatenated with the input prior to encoding and again to the latent space prior to decoding, we can sample individual digits and combinations. This gif shows a transition along the number line from zero to nine.