hongsunjang's Stars
01-ai/Yi
A series of large language models trained from scratch by developers @01-ai
meta-llama/llama-models
Utilities intended for use with Llama models.
pytorch/torchtune
PyTorch native post-training library
DefTruth/Awesome-LLM-Inference
πA curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. ππ
stanford-crfm/helm
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in HEIM (https://arxiv.org/abs/2311.04287) and vision-language models in VHELM (https://arxiv.org/abs/2410.07112).
Xnhyacinth/Awesome-LLM-Long-Context-Modeling
π° Must-read papers and blogs on LLM based Long Context Modeling π₯
dhm2013724/yolov2_xilinx_fpga
A demo for accelerating YOLOv2 in xilinx's fpga pynq/zedboard
NVIDIA/RULER
This repo contains the source code for RULER: Whatβs the Real Context Size of Your Long-Context Language Models?
jzhang38/EasyContext
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
feifeibear/long-context-attention
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
Cornell-RelaxML/QuIP
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
Maratyszcza/FP16
Conversion to/from half-precision floating point formats
spcl/gemm_hls
Scalable systolic array-based matrix-matrix multiplication implemented in Vivado HLS for Xilinx FPGAs.
UCLA-VAST/AutoSA
AutoSA: Polyhedral-Based Systolic Array Compiler
AminRezaei0x443/memory-efficient-attention
Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch
GFNOrg/gfn-lm-tuning
Xilinx/Vitis_Embedded_Platform_Source
microsoft/LongRoPE
LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.
HLSTransform/submission
linghaosong/Sextans
An FPGA accelerator for general-purpose Sparse-Matrix Dense-Matrix Multiplication (SpMM).
cathalmccabe/PYNQ_tutorials
MicrochipTech/fpga-hls-examples
Open-Source HLS Examples for Microchip FPGAs
Sibylau/HLS_designs
Systolic array implementations for Cholesky, LU, and QR decomposition
mesham/pynq_api
C API drivers for PYNQ FPGA board
twaclaw/matmult
A floating-point matrix multiplication implemented in hardware
PSCLab-ASU/Systolic-CNN
Xilinx/libdfx
tzuj6/Object-detection-accelerator-in-Xilinx-PYNQ-z2
berkekisin/Pytorch_spmm_COO
Pytorch extension library of Sparse Dense Matrix Multiplication in COO format
Edgecortix-Inc/fp_reduce_vitis
Full implementation of optimized floating-point reduction targeting any board supported in Vitis flow (Alveo, Zynq MPSoC, etc.)