Pinned Repositories
AArch64-Explore
applegpu
Apple G13 GPU architecture docs and tools
ArchProbe
A profiler to disclose and quantify hardware features on GPUs.
asitop
Perf monitoring CLI tool for Apple Silicon
AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
BasicCUDA
cformers
SoTA Transformers with C-backend for fast inference on your CPU.
DL-Framework-Tutorial
Create a Deep Learning Framework from scratch
gpt4-pdf-chatbot-langchain
GPT4 & LangChain Chatbot for large PDF docs
nnlib
cloned from https://source.codeaurora.org/quic/hexagon_nn/nnlib/
KaneHui's Repositories
KaneHui/AArch64-Explore
KaneHui/applegpu
Apple G13 GPU architecture docs and tools
KaneHui/ArchProbe
A profiler to disclose and quantify hardware features on GPUs.
KaneHui/asitop
Perf monitoring CLI tool for Apple Silicon
KaneHui/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
KaneHui/BasicCUDA
KaneHui/cformers
SoTA Transformers with C-backend for fast inference on your CPU.
KaneHui/gpt4-pdf-chatbot-langchain
GPT4 & LangChain Chatbot for large PDF docs
KaneHui/HelloSilicon
An introduction to ARM64 assembly on Apple Silicon Macs
KaneHui/incubator-tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
KaneHui/insn_bench_aarch64
Instruction latency & throughput profiler for AArch64
KaneHui/llama.cpp
Port of Facebook's LLaMA model in C/C++
KaneHui/llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.
KaneHui/LLVM_for_cpu0
This is a tutorial to learn LLVM, I realize a backend to compiler machine code for cpu0 which is a simple RISC cpu.
KaneHui/ml-compiler-opt
Infrastructure for Machine Learning Guided Optimization (MLGO) in LLVM.
KaneHui/MMdnn
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.
KaneHui/GPTQ-for-LLaMa
4 bits quantization of LLaMA using GPTQ
KaneHui/gpu-benches
collection of benchmarks to measure basic GPU capabilities
KaneHui/langchain
⚡ Building applications with LLMs through composability ⚡
KaneHui/llm-viz
3D Visualization of an GPT-style LLM
KaneHui/Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
KaneHui/MNN
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
KaneHui/MOSS
An open-source tool-augmented conversational language model from Fudan University
KaneHui/netron
Visualizer for neural network, deep learning and machine learning models
KaneHui/NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
KaneHui/SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
KaneHui/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
KaneHui/stf
Control and manage Android devices from your browser.
KaneHui/XiangShan
Open-source high-performance RISC-V processor
KaneHui/XiangShan-doc
Documentation for XiangShan