hunxuewangzi

hunxuewangzi's Stars

nlpxucan/WizardLM
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
Language:Python9.3k 112 191722
Instruction-Tuning-with-GPT-4/GPT-4-LLM
Instruction Tuning with GPT-4
Language:HTML4.2k 43 34300
tatsu-lab/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Language:Jupyter Notebook1.6k 8 152247
OSU-NLP-Group/HippoRAG
[NeurIPS'24] HippoRAG is a novel RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge across external documents. RAG + Knowledge Graphs + Personalized PageRank.
Language:Python1.5k 14 37124
ThuCCSLab/Awesome-LM-SSP
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
1k 22 2369
WeOpenML/PandaLM
Language:Python892 13 3267
jianzhnie/awesome-instruction-datasets
A collection of awesome-prompt-datasets, awesome-instruction-dataset, to train ChatLLM such as chatgpt 收录各种各样的指令数据集, 用于训练 ChatLLM 模型。
570 6 031
hkust-nlp/deita
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
Language:Python513 6 2728
princeton-nlp/LESS
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
Language:Jupyter Notebook394 5 3736
zjunlp/AutoKG
LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities
Language:Python376 9 634
microsoft/rho
Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.
370 6 611
tianyi-lab/Cherry_LLM
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other models
Language:Python323 3 3021
ZigeW/data_management_LLM
Collection of training data management explorations for large language models
299 5 129
OFA-Sys/InsTag
InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning
229 4 107
alon-albalak/data-selection-survey
A Survey on Data Selection for Language Models
195 5 110
zjunlp/IEPile
[ACL 2024] IEPile: A Large-Scale Information Extraction Corpus
Language:Python179 7 3117
czbiohub-sf/tabula-muris-senis
Tabula Muris Senis
Language:Jupyter Notebook100 9 5326
sail-sg/regmix
🧬 RegMix: Data Mixture as Regression for Language Model Pre-training
Language:Jupyter Notebook95 5 105
shizhediao/R-Tuning
[NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't Know'"
Language:Python94 3 79
gpt4life/alpagasus
Unofficial implementation of AlpaGasus
Language:Python86 3 76
pldlgb/nuggets
Language:Jupyter Notebook76 1 102
IronBeliever/CaR
Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
Language:Python71 1 61
lunyiliu/CoachLM
Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.
Language:Python59 0 14
YangLing0818/SuperCorrect-llm
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights
Language:Python41 3 31
DAMO-NLP-SG/Auto-Arena-LLMs
Language:Jupyter Notebook35 2 31
zjunlp/WorfBench
Benchmarking Agentic Workflow Generation
Language:Python321
Lichang-Chen/AlpaGasus
A better Alpaca Model Trained with Less Data (only 9k instructions of the original set)
Language:HTML21 2 23
Blue-Raincoat/SelectIT
Language:Python132
2003pro/TAGCOS
This is the official implementation of TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Data
Language:Python10 1 00
xypan0/G-DIG
Language:Python10