/fastmlp

[WIP] PyTorch bindings for cublasLt with an example of quantized i8f16 MLP