malfet

California

Pinned Repositories

DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python2 0 00
flake8-coding
Language:Python1 1 01
gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Language:Python2 0 01
llm_experiments
Language:Metal2 2 03
malfet.github.io
My Web experiments
Language:HTML1 2 00
Mandelbrot
Language:C++1 2 00
oneDNN
oneAPI Deep Neural Network Library (oneDNN)
Language:C++2 0 00
PeachPy
x86-64 assembler embedded in Python
Language:Python1 0 01
pocketfft
Clone of https://gitlab.mpcdf.mpg.de/mtr/pocketfft
Language:C1 2 00

malfet's Repositories

malfet/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python2 0 00
malfet/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Language:Python2 0 01
malfet/llm_experiments
Language:Metal2 2 03
malfet/oneDNN
oneAPI Deep Neural Network Library (oneDNN)
Language:C++2 0 00
malfet/malfet.github.io
My Web experiments
Language:HTML1 2 00
malfet/PeachPy
x86-64 assembler embedded in Python
Language:Python1 0 01
malfet/deleteme
Dummy repo
Language:Objective-C++0 5 52
malfet/add-github-ssh-key
Add your ssh keys to github actions runners, works best for self hosted runners on EC2, made for https://github.com/pytorch/pytorch
Language:TypeScript0 0
malfet/builder
Continuous builder and binary build scripts for pytorch
Language:Shell1 0
malfet/checkout
Action for checking out a repo
Language:TypeScript0 01
malfet/ColossalAI
Making big AI models cheaper, easier, and more scalable
Language:Python0 0
malfet/csprng
Cryptographically secure pseudorandom number generators for PyTorch
Language:Batchfile0 0
malfet/cudasnippets
Migrating public CUDA related gists into a repo
Language:Cuda1 0
malfet/cudnn-frontend
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
Language:C++0 0
malfet/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++0 0
malfet/doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
Language:Python0 0
malfet/fairscale
PyTorch extensions for high performance and large scale training.
Language:Python0 0
malfet/hipify_torch
Language:Python0 0
malfet/isort
A Python utility / library to sort imports.
Language:Python0 0
malfet/llama2.so
Inference Llama 2 with a model compiled to native code by TorchInductor
Language:C++0 0
malfet/NNPACK
Acceleration package for neural networks on multi-core CPUs
Language:C1 0
malfet/once-for-all
[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment
Language:Python0 0
malfet/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:C++2 0
malfet/pytorch.github.io
The website for PyTorch
Language:Jupyter Notebook1 0
malfet/s2024
Language:HTML0 0
malfet/serve
Model Serving on PyTorch
Language:Java0 0
malfet/triton
Development repository for the Triton language and compiler
Language:C++0 0
malfet/upload-artifact-s3
Language:TypeScript0 0
malfet/xbyak_aarch64
Language:C++0 0
malfet/XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Language:C