fastmlp [WIP] PyTorch bindings for cublasLt with an example of quantized i8f16 MLP References https://github.com/OpenBMB/cpm_kernels/blob/master/cpm_kernels/library/cublaslt.py