Pinned Repositories
AM207
cme213_material_2013
CME 213 Class Material
cryptocurrency-derivatives-pricing-and-delta-neutral-volatility-trading
This project is to download and analyze cryptocurrency option data available on Deribit via a public API. Data are collected on an Ubuntu remote server with the implementation of Python3, Shell and SQLite and are then analyzed locally with Python3.
DL_packt
intro_to_simpy
nbdev-tutorial
nbdev tutorial
Python-Financial-Tools
Providing financial analysis tools to the Python open-source community.
ssg-dataset
Open reproducible dataset on static site generators (SSG) popularity.
triton-rs
unpack_int4
jeromeku's Repositories
jeromeku/triton-rs
jeromeku/unpack_int4
jeromeku/accelerated-scan
Accelerated First Order Parallel Associative Scan
jeromeku/ao
torchao: PyTorch Architecture Optimization (AO). A repository to host AO techniques and performant kernels that work with PyTorch.
jeromeku/api-design
LivingSocial API Design Guide
jeromeku/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
jeromeku/candle
Minimalist ML framework for Rust
jeromeku/colab-connect
Connect to Google Colab VM from your local VSCode
jeromeku/colab-test
jeromeku/cookbook-dev
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
jeromeku/cutlass
CUDA Templates for Linear Algebra Subroutines
jeromeku/CutlassProgramming
jeromeku/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
jeromeku/EVT_AE
Artifacts of EVT ASPLOS'24
jeromeku/extension_builder
jeromeku/FlagAttention
A collection of memory efficient attention operators implemented in the Triton language.
jeromeku/fsdp_qlora
Training LLMs with QLoRA + FSDP
jeromeku/GaLore
jeromeku/GEMM_MMA
Optimize GEMM with tensorcore step by step
jeromeku/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
jeromeku/long-context-attention
USP: Hybrid Sequence Parallel Attention for Long Context Transformers Model Training and Inference
jeromeku/punica
Serving multiple LoRA finetuned LLM as one
jeromeku/pybind_example
jeromeku/sc23-dl-tutorial
SC23 Deep Learning at Scale Tutorial Material
jeromeku/stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
jeromeku/torchtune
A Native-PyTorch Library for LLM Fine-tuning
jeromeku/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
jeromeku/triton
Development repository for the Triton language and compiler
jeromeku/triton-aot
jeromeku/unsloth
5X faster 60% less memory QLoRA finetuning