Notice: This is all for my fun and education. I don't know much about hardware design & HDLs, so this is probably not any good ;-)
Credits for the design go to @Domipheus. Basically a reimplementation in MyHDL of his TPU design. See: https://github.com/Domipheus/TPU