Keras, Pytorch, and custom (from scratch) implementations.
The from scratch implementation is a 3 layer network (784, 128, 10). It is currently at 97% accuracy.
Two things I could change to improve is swapping the loss function from mean squared error to cross-entropy, and the output layer activation from sigmoid to softmax.