Pinned Repositories
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
flake8-coding
gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
llm_experiments
malfet.github.io
My Web experiments
Mandelbrot
oneDNN
oneAPI Deep Neural Network Library (oneDNN)
PeachPy
x86-64 assembler embedded in Python
pocketfft
Clone of https://gitlab.mpcdf.mpg.de/mtr/pocketfft
malfet's Repositories
malfet/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
malfet/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
malfet/llm_experiments
malfet/oneDNN
oneAPI Deep Neural Network Library (oneDNN)
malfet/malfet.github.io
My Web experiments
malfet/PeachPy
x86-64 assembler embedded in Python
malfet/deleteme
Dummy repo
malfet/add-github-ssh-key
Add your ssh keys to github actions runners, works best for self hosted runners on EC2, made for https://github.com/pytorch/pytorch
malfet/builder
Continuous builder and binary build scripts for pytorch
malfet/checkout
Action for checking out a repo
malfet/ColossalAI
Making big AI models cheaper, easier, and more scalable
malfet/csprng
Cryptographically secure pseudorandom number generators for PyTorch
malfet/cudasnippets
Migrating public CUDA related gists into a repo
malfet/cudnn-frontend
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
malfet/cutlass
CUDA Templates for Linear Algebra Subroutines
malfet/doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
malfet/fairscale
PyTorch extensions for high performance and large scale training.
malfet/hipify_torch
malfet/isort
A Python utility / library to sort imports.
malfet/llama2.so
Inference Llama 2 with a model compiled to native code by TorchInductor
malfet/NNPACK
Acceleration package for neural networks on multi-core CPUs
malfet/once-for-all
[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment
malfet/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
malfet/pytorch.github.io
The website for PyTorch
malfet/s2024
malfet/serve
Model Serving on PyTorch
malfet/triton
Development repository for the Triton language and compiler
malfet/upload-artifact-s3
malfet/xbyak_aarch64
malfet/XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web