This experimental work uses Google's low precision GEMM and only supports few modules.
git clone https://github.com/jhjin/nn8 --recursive
cd nn8
luarocks make rocks/nn8-scm-1.rockspec
th test-precision.lua # small model
th test-speed.lua # large model