mugua1020

mugua1020's Stars

cornell-zhang/allo
Allo: A Programming Model for Composable Accelerator Design
Language:Python18331
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python34.5k5.3k
Aaronhuang-778/BiLLM
(ICML 2024) BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Language:Python20714
jofrfu/tinyTPU
Implementation of a Tensor Processing Unit for embedded systems and the IoT.
Language:VHDL41262
PKU-YuanGroup/LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Language:Python77254
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
13.6k871
karpathy/llm.c
LLM training in simple, raw C/CUDA
Language:Cuda25.1k2.9k
AIoT-MLSys-Lab/Efficient-LLMs-Survey
[TMLR 2024] Efficient Large Language Models: A Survey
1.1k89
PolyArch/dsa-framework
Release of stream-specialization software/hardware stack.
Language:Python12024
Jerc007/Open-GPGPU-FlexGrip-
FlexGripPlus: an open-source GPU model for reliability evaluation and micro architectural simulation
Language:VHDL8818
mbzuai-oryx/groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Language:Python81541
htqin/BiBench
[ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binarization.
Language:Python534
TadejMurovic/BNN_Deployment
Part of paper: Massively Parallel Combinational Binary Neural Networks for Edge Processing
Language:MATLAB111
facebookresearch/Ternary_Binary_Transformer
ACL 2023
Language:Python38
Phuoc-Hoan-Le/BinaryViT
BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models
Language:Python335
IST-DASLab/gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Language:Python2k163
liltom-eth/llama2-webui
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper` as your local llama2 backend for Generative Agents/Apps.
Language:Jupyter Notebook2k203
qwopqwop200/GPTQ-for-LLaMa
4 bits quantization of LLaMA using GPTQ
Language:Python3k460
meta-llama/llama
Inference code for Llama models
Language:Python57.3k9.7k
JCruan519/EGE-UNet
(MICCAI23) This is the official code repository for "EGE-UNet: an Efficient Group Enhanced UNet for skin lesion segmentation".
Language:Python25723
huawei-noah/Efficient-Computing
Efficient computing methods developed by Huawei Noah's Ark Lab
Language:Jupyter Notebook1.2k212
awai54st/Logic-Shrinkage
224
cornell-zhang/bnn-fpga
Binarized Convolutional Neural Networks on Software-Programmable FPGAs
Language:C304112
cornell-zhang/FracBNN
FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations
Language:Python8921
bywmm/Bi-GCN
Implementation of "Binary Graph Convolutional Network", CVPR 2021, and TPAMI 2024.
Language:Python243
Xilinx/finn
Dataflow compiler for QNN inference on FPGAs
Language:Python777247
facebookresearch/bit
Code repo for the paper BiT Robustly Binarized Multi-distilled Transformer
Language:Python10312
IMRL/GSB-Vision-Transformer
Language:Python51
huawei-noah/Pretrained-Language-Model
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
Language:Python3k629
htqin/BiBERT
This project is the official implementation of our accepted ICLR 2022 paper BiBERT: Accurate Fully Binarized BERT.
Language:Python867