mugua1020's Stars
cornell-zhang/allo
Allo: A Programming Model for Composable Accelerator Design
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Aaronhuang-778/BiLLM
(ICML 2024) BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
jofrfu/tinyTPU
Implementation of a Tensor Processing Unit for embedded systems and the IoT.
PKU-YuanGroup/LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
karpathy/llm.c
LLM training in simple, raw C/CUDA
AIoT-MLSys-Lab/Efficient-LLMs-Survey
[TMLR 2024] Efficient Large Language Models: A Survey
PolyArch/dsa-framework
Release of stream-specialization software/hardware stack.
Jerc007/Open-GPGPU-FlexGrip-
FlexGripPlus: an open-source GPU model for reliability evaluation and micro architectural simulation
mbzuai-oryx/groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
htqin/BiBench
[ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binarization.
TadejMurovic/BNN_Deployment
Part of paper: Massively Parallel Combinational Binary Neural Networks for Edge Processing
facebookresearch/Ternary_Binary_Transformer
ACL 2023
Phuoc-Hoan-Le/BinaryViT
BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models
IST-DASLab/gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
liltom-eth/llama2-webui
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper` as your local llama2 backend for Generative Agents/Apps.
qwopqwop200/GPTQ-for-LLaMa
4 bits quantization of LLaMA using GPTQ
meta-llama/llama
Inference code for Llama models
JCruan519/EGE-UNet
(MICCAI23) This is the official code repository for "EGE-UNet: an Efficient Group Enhanced UNet for skin lesion segmentation".
huawei-noah/Efficient-Computing
Efficient computing methods developed by Huawei Noah's Ark Lab
awai54st/Logic-Shrinkage
cornell-zhang/bnn-fpga
Binarized Convolutional Neural Networks on Software-Programmable FPGAs
cornell-zhang/FracBNN
FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations
bywmm/Bi-GCN
Implementation of "Binary Graph Convolutional Network", CVPR 2021, and TPAMI 2024.
Xilinx/finn
Dataflow compiler for QNN inference on FPGAs
facebookresearch/bit
Code repo for the paper BiT Robustly Binarized Multi-distilled Transformer
IMRL/GSB-Vision-Transformer
huawei-noah/Pretrained-Language-Model
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
htqin/BiBERT
This project is the official implementation of our accepted ICLR 2022 paper BiBERT: Accurate Fully Binarized BERT.