ybch14
PhD, Tsinghua University. Research Interest: NLP, Knowledge Graph
Tsinghua UniversityBeijing, China
ybch14's Stars
lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
chatchat-space/Langchain-Chatchat
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
tatsu-lab/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
VundleVim/Vundle.vim
Vundle, the plug-in manager for Vim
tloen/alpaca-lora
Instruct-tune LLaMA on consumer hardware
ymcui/Chinese-LLaMA-Alpaca
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
huggingface/trl
Train transformer language models with reinforcement learning.
easymotion/vim-easymotion
Vim motions on speed!
ymcui/Chinese-LLaMA-Alpaca-2
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
Azure-Samples/azure-search-openai-demo
A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
togethercomputer/RedPajama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
shibing624/text2vec
text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
esbatmop/MNBVC
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
opendilab/awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
PhoebusSi/Alpaca-CoT
We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts to initiate any meaningful PR on this repo and integrate as many LLM related technologies as possible. 我们打造了方便研究人员上手和使用大模型等微调平台,我们欢迎开源爱好者发起任何有意义的pr!
hyp1231/awesome-llm-powered-agent
Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...
taoyds/spider
scripts and baselines for Spider: Yale complex and cross-domain semantic parsing and text-to-SQL challenge
allenai/papermage
library supporting NLP and CV research on scientific papers
Tebmer/Awesome-Knowledge-Distillation-of-LLMs
This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.
sambanova/bloomchat
This repo contains the data preparation, tokenization, training and inference code for BLOOMChat. BLOOMChat is a 176 billion parameter multilingual chat model based on BLOOM.
jianzhnie/awesome-instruction-datasets
A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。
QwenLM/qwen.cpp
C++ implementation of Qwen-LM
jkkummerfeld/text2sql-data
A collection of datasets that pair questions with SQL queries.
longyuewangdcu/Chinese-Llama-2
improve Llama-2's proficiency in comprehension, generation, and translation of Chinese.
pffang/libiconv-for-Windows
iconv library for Windows (Microsoft Visual Studio Compiler)
DreamerGPT/DreamerGPT
🌱 梦想家(DreamerGPT):中文大语言模型指令精调
usnistgov/vulntology
Development of the NIST vulnerability data ontology (Vulntology).
DominikLindorfer/SQL-LLaMA2
LLaMA-2 Finetuned for Text-2-SQL
Denilah/Instruction_Code_Datasets
A repository of datasets in the domain of code for instruction fine-tuning.