ybch14

PhD, Tsinghua University. Research Interest: NLP, Knowledge Graph

Tsinghua UniversityBeijing, China

ybch14's Stars

lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Language:Python37.5k 352 1.8k4.6k
chatchat-space/Langchain-Chatchat
Langchain-Chatchat（原Langchain-ChatGLM）基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
Language:TypeScript32.9k 291 4k5.7k
tatsu-lab/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
Language:Python29.7k 344 2694.1k
VundleVim/Vundle.vim
Vundle, the plug-in manager for Vim
Language:Vim Script24k 692 7222.6k
tloen/alpaca-lora
Instruct-tune LLaMA on consumer hardware
Language:Jupyter Notebook18.8k 155 4702.2k
ymcui/Chinese-LLaMA-Alpaca
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
Language:Python18.6k 184 7321.9k
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Language:Python16.9k 111 1.1k1.7k
huggingface/trl
Train transformer language models with reinforcement learning.
Language:Python10.6k 77 1.3k1.4k
easymotion/vim-easymotion
Vim motions on speed!
Language:Vim script7.5k 67 403361
ymcui/Chinese-LLaMA-Alpaca-2
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
Language:Python7.1k 78 389581
Azure-Samples/azure-search-openai-demo
A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
Language:Python6.5k 233 1.2k4.4k
togethercomputer/RedPajama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Language:Python4.6k 78 92348
shibing624/text2vec
text2vec, text to vector. 文本向量表征工具，把文本转化为向量矩阵，实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型，开箱即用。
Language:Python4.6k 31 151405
esbatmop/MNBVC
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化，也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
3.6k 66 56252
opendilab/awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
3.6k 61 4220
PhoebusSi/Alpaca-CoT
We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts to initiate any meaningful PR on this repo and integrate as many LLM related technologies as possible. 我们打造了方便研究人员上手和使用大模型等微调平台，我们欢迎开源爱好者发起任何有意义的pr！
Language:Jupyter Notebook2.7k 35 100249
hyp1231/awesome-llm-powered-agent
Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...
1.7k 50 9144
taoyds/spider
scripts and baselines for Spider: Yale complex and cross-domain semantic parsing and text-to-SQL challenge
Language:Python863 29 102195
allenai/papermage
library supporting NLP and CV research on scientific papers
Language:Python724 11 3657
Tebmer/Awesome-Knowledge-Distillation-of-LLMs
This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.
724 11 541
sambanova/bloomchat
This repo contains the data preparation, tokenization, training and inference code for BLOOMChat. BLOOMChat is a 176 billion parameter multilingual chat model based on BLOOM.
Language:Python588 12 1052
jianzhnie/awesome-instruction-datasets
A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。
577 6 030
QwenLM/qwen.cpp
C++ implementation of Qwen-LM
Language:C++569 11 7449
jkkummerfeld/text2sql-data
A collection of datasets that pair questions with SQL queries.
Language:Python550 18 38108
longyuewangdcu/Chinese-Llama-2
improve Llama-2's proficiency in comprehension, generation, and translation of Chinese.
Language:Python534 16 1334
pffang/libiconv-for-Windows
iconv library for Windows (Microsoft Visual Studio Compiler)
Language:C100 7 145
DreamerGPT/DreamerGPT
🌱 梦想家(DreamerGPT)：中文大语言模型指令精调
Language:Python50 5 02
usnistgov/vulntology
Development of the NIST vulnerability data ontology (Vulntology).
Language:JavaScript37 21 9911
DominikLindorfer/SQL-LLaMA2
LLaMA-2 Finetuned for Text-2-SQL
Language:Python23 3 04
Denilah/Instruction_Code_Datasets
A repository of datasets in the domain of code for instruction fine-tuning.
4 2 00

ybch14

ybch14's Stars

lm-sys/FastChat

chatchat-space/Langchain-Chatchat

tatsu-lab/stanford_alpaca

VundleVim/Vundle.vim

tloen/alpaca-lora

ymcui/Chinese-LLaMA-Alpaca

huggingface/peft

huggingface/trl

easymotion/vim-easymotion

ymcui/Chinese-LLaMA-Alpaca-2

Azure-Samples/azure-search-openai-demo

togethercomputer/RedPajama-Data

shibing624/text2vec

esbatmop/MNBVC

opendilab/awesome-RLHF

PhoebusSi/Alpaca-CoT

hyp1231/awesome-llm-powered-agent

taoyds/spider

allenai/papermage

Tebmer/Awesome-Knowledge-Distillation-of-LLMs

sambanova/bloomchat

jianzhnie/awesome-instruction-datasets

QwenLM/qwen.cpp

jkkummerfeld/text2sql-data

longyuewangdcu/Chinese-Llama-2

pffang/libiconv-for-Windows

DreamerGPT/DreamerGPT

usnistgov/vulntology

DominikLindorfer/SQL-LLaMA2

Denilah/Instruction_Code_Datasets