This is a simple one, done in Rust (because how dare you ask for any other language)
- forward propagation
- back propagation
- actual learning
- Where is the bias applied? Is it per node? Is it on the input nodes?
- If the bias weight is 1, then a node would have a value of 6+1=7, right? 6 being the value of all the previous nodes and their weights and whatnot, and then you add the bias? That seems to stack up quickly.