This is an implementation of a two-layer neural network during the live demo by @Sirajology on Youtube. The training method is stochastic (online) gradient descent with momentum. It computes XOR for the given input. It uses two activation functions, one for each layer. One is a tanh function and the other is the sigmoid function. It uses cross-entropy as it's loss function. This is all done in less than 100 lines of code. We're building this thing from scratch!
None!
Just run the following in terminal to see it run.
python demo.py
The credits for the majority of this code go to lightcaster. I've merely created a wrapper to get people started.