hunxuewangzi's Stars
nlpxucan/WizardLM
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
Instruction-Tuning-with-GPT-4/GPT-4-LLM
Instruction Tuning with GPT-4
tatsu-lab/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
OSU-NLP-Group/HippoRAG
[NeurIPS'24] HippoRAG is a novel RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge across external documents. RAG + Knowledge Graphs + Personalized PageRank.
ThuCCSLab/Awesome-LM-SSP
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
WeOpenML/PandaLM
jianzhnie/awesome-instruction-datasets
A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。
hkust-nlp/deita
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
princeton-nlp/LESS
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
zjunlp/AutoKG
LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities
microsoft/rho
Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.
tianyi-lab/Cherry_LLM
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other models
ZigeW/data_management_LLM
Collection of training data management explorations for large language models
OFA-Sys/InsTag
InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning
alon-albalak/data-selection-survey
A Survey on Data Selection for Language Models
zjunlp/IEPile
[ACL 2024] IEPile: A Large-Scale Information Extraction Corpus
czbiohub-sf/tabula-muris-senis
Tabula Muris Senis
sail-sg/regmix
🧬 RegMix: Data Mixture as Regression for Language Model Pre-training
shizhediao/R-Tuning
[NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't Know'"
gpt4life/alpagasus
Unofficial implementation of AlpaGasus
pldlgb/nuggets
IronBeliever/CaR
Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
lunyiliu/CoachLM
Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.
YangLing0818/SuperCorrect-llm
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights
DAMO-NLP-SG/Auto-Arena-LLMs
zjunlp/WorfBench
Benchmarking Agentic Workflow Generation
Lichang-Chen/AlpaGasus
A better Alpaca Model Trained with Less Data (only 9k instructions of the original set)
Blue-Raincoat/SelectIT
2003pro/TAGCOS
This is the official implementation of TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Data
xypan0/G-DIG