xyangk's Stars
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
01-ai/Yi-1.5
Yi-1.5 is an upgraded version of Yi, delivering stronger performance in coding, math, reasoning, and instruction-following capability.
yaoxieyoulei/mytv-android
使用Android原生开发的电视直播软件
casper-hansen/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
naklecha/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
NousResearch/Hermes-Function-Calling
cognitivecomputations/OpenChatML
MinorJerry/WebVoyager
Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"
ddupont808/GPT-4V-Act
AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI
web-arena-x/visualwebarena
VisualWebArena is a benchmark for multimodal agents.
nlpxucan/WizardLM
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
karpathy/llm.c
LLM training in simple, raw C/CUDA
karpathy/minbpe
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
OpenBMB/MiniCPM
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
shuxueslpi/chatGLM-6B-QLoRA
使用peft库,对chatGLM-6B/chatGLM2-6B实现4bit的QLoRA高效微调,并做lora model和base model的merge及4bit的量化(quantize)。
ossu/computer-science
🎓 Path to a free self-taught education in Computer Science!
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
NascentCore/llm-numbers-cn
中文版 llm-numbers
OSU-NLP-Group/SeeAct
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
triton-inference-server/tutorials
This repository contains tutorials and examples for Triton Inference Server
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Instruction-Tuning-with-GPT-4/GPT-4-LLM
Instruction Tuning with GPT-4
mit-han-lab/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
hiyouga/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
deepseek-ai/DeepSeek-Coder
DeepSeek Coder: Let the Code Write Itself
bitsandbytes-foundation/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.