parrotsky

efficient AI inference a the edge

Pinned Repositories

academic-kickstart
Language:Jupyter Notebook00
AttentioNN
All about attention in neural networks. Soft attention, attention maps, local and global attention and multi-head attention.
Language:Jupyter Notebook00
AutoDiCE
distributed CNN inference at the edge, extend ncnn with CUDA, MPI+OPENMP support.
Language:C++174
AutoDiCE_examples
multinode examples for AutoDiCE
Language:C++00
cavia
Code for "Fast Context Adaptation via Meta-Learning"
Language:Python00
ChatGLM2-6B
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Language:Python00
convex_adversarial
A method for training neural networks that are provably robust to adversarial attacks.
Language:Python00
DeepThings
A Portable C Library for Distributed CNN Inference on IoT Edge Clusters
Language:C00
fastllm
纯c++的全平台llm加速库，支持python调用，chatglm-6B级模型单卡可达10000+token / s，支持glm, llama, moss基座，手机端流畅运行
Language:C++00
how-to-optimize-gemm
row-major matmul optimization
Language:C++00

parrotsky's Repositories

parrotsky/AutoDiCE
distributed CNN inference at the edge, extend ncnn with CUDA, MPI+OPENMP support.
Language:C++174
parrotsky/academic-kickstart
Language:Jupyter Notebook00
parrotsky/AutoDiCE_examples
multinode examples for AutoDiCE
Language:C++00
parrotsky/ChatGLM2-6B
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Language:Python00
parrotsky/convex_adversarial
A method for training neural networks that are provably robust to adversarial attacks.
Language:Python00
parrotsky/DeepThings
A Portable C Library for Distributed CNN Inference on IoT Edge Clusters
Language:C00
parrotsky/fastllm
纯c++的全平台llm加速库，支持python调用，chatglm-6B级模型单卡可达10000+token / s，支持glm, llama, moss基座，手机端流畅运行
Language:C++00
parrotsky/how-to-optimize-gemm
row-major matmul optimization
Language:C++00
parrotsky/improved-diffusion
Release for Improved Denoising Diffusion Probabilistic Models
parrotsky/LearnCMake
parrotsky/LTP
[KDD'22] Learned Token Pruning for Transformers
parrotsky/mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
parrotsky/model_analyzer
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
parrotsky/ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Language:C++
parrotsky/NM-sparsity
parrotsky/openc910
OpenXuantie - OpenC910 Core
parrotsky/Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
parrotsky/parrotsky.github.io
Language:HTML
parrotsky/PyTorch-Pretrained-ViT
Vision Transformer (ViT) in PyTorch
parrotsky/QC-Drug
Language:Jupyter Notebook
parrotsky/RiskAwareLearning_VoltageOpt_DistGrid
parrotsky/starter-hugo-academic
🎓 Hugo Academic Theme 创建一个学术网站. Easily create a beautiful academic résumé or educational website using Hugo, GitHub, and Netlify.
parrotsky/sycl
SYCL for Vitis: Experimental fusion of triSYCL with Intel SYCL oneAPI DPC++ up-streaming effort into Clang/LLVM
parrotsky/SYCL-DNN
SYCL-DNN is a library implementing neural network algorithms written using SYCL
parrotsky/the-story-of-heads
This is a repository with the code for the ACL 2019 paper "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned" and the ACL 2021 paper "Analyzing Source and Target Contributions to NMT Predictions".
parrotsky/Torch-Pruning
Structural Pruning for Model Acceleration
parrotsky/trt-samples-for-hackathon-cn
Simple samples for TensorRT programming
parrotsky/trt2023
parrotsky/Vitis-AI
Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
parrotsky/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs