open source ChatGPT and beyond
On the road to implement open-source ChatGPT-like models and beyond.
Since the accidental leak of LLaMA model weights, and the impressive performance of Stanford Alpaca, which is trained on LLaMA using data generated by GPT-3 api with the self-instruct technique, the open-source community has been excited about the promising future of reproducing ChatGPT in an open way.
This repo aims at recording this process, and providing an overview of how to get involved.
Including: base models, technologies, data, domain models, training pipelines, speed up techniques, multi-language, multi-modal, and more to go.
thanks @FunnySaltyFish for the website version, here is the code.
Any contribution to this project and the website is appreciated! (we are short of hands ...)
Table of Contents
- Base Models
- Domain Models
- General Domain Instruction Models
- Alternatives To Transformer
- Multi-Modal
- Data
- Evaluation
- Framework/ToolKit/Platform
- Alignment
- Multi-Language
- Efficient Training/Fine-Tuning
- Low-Cost Inference
- Safety
- Truthfulness
- Extend Context Window
- Knowledge Editing
- External Knowledge
- External Tools
- Autonomus Problem Solving
- Similar Collections
Base Models
contributor | model/project | license | language | main feature |
---|---|---|---|---|
Meta | LLaMA/LLaMA2 | multi | LLaMA-13B outperforms GPT-3(175B) and LLaMA-65B is competitive to PaLM-540M. Base model for most follow-up works. |
|
@turboderp | ExLlama | MIT license | multi | A standalone Python/C++/CUDA implementation of Llama for use with 4-bit GPTQ weights, designed to be fast and memory-efficient on modern GPUs. |
HuggingFace-BigScience | BLOOM | multi | an autoregressive Large Language Model (LLM) trained by HuggingFace BigScience. | |
HuggingFace-BigScience | BLOOMZ | multi | instruction-finetuned version of BLOOM & mT5 pretrained multilingual language models on crosslingual task mixture. | |
EleutherAI | GPT-J | en | transformer model trained using Ben Wang'sMesh Transformer JAX. | |
Meta | OPT | en | Open Pre-trained Transformer Language Models, aim in developing this suite of OPT models is to enable reproducible and responsible research at scale, and to bring more voices to the table in studying the impact of these LLMs. |
|
Cerebras Systems | Cerebras-GPT | en | Pretrained LLM, GPT-3 like, Commercially available, efficiently trained on theAndromeda AI supercomputer, trained in accordance withChinchilla scaling laws (20 tokens per model parameter) which is compute-optimal. |
|
EleutherAI | pythia | en | combine interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers. |
|
Stability-AI | StableLM | en | Stability AI Language Models | |
FDU | MOSS | en/zh | An open-source tool-augmented conversational language model from Fudan University. | |
ssymmetry & FDU | BBT-2 | zh | 12B open-source LM. | |
@mlfoundations | OpenFlamingo | en | An open-source framework for training large multimodal models. | |
EleutherAI | GPT-NeoX-20B | en | Its architecture intentionally resembles that of GPT-3, and is almost identical to that ofGPT-J- 6B. | |
UCB | OpenLLaMA | Apache-2.0 | en | An Open Reproduction of LLaMA. |
MosaicML | MPT | Apache-2.0 | en | MPT-7B is a GPT-style model, and the first in the MosaicML Foundation Series of models. Trained on 1T tokens of a MosaicML-curated dataset, MPT-7B is open-source, commercially usable, and equivalent to LLaMa 7B on evaluation metrics. |
TogetherComputer | RedPajama-INCITE-Base-3B-v1 | Apache-2.0 | en | A 2.8B parameter pretrained language model, pretrained onRedPajama-Data-1T, together with an Instruction-tuned Version and a Chat Version. |
Lightning-AI | Lit-LLaMA | Apache-2.0 | - | Independent implementation ofLLaMA that is fully open source under the Apache 2.0 license. |
@conceptofmind | PaLM | MIT License | en | An open-source implementation of Google PaLM models. |
TII | Falcon-7B | TII Falcon LLM License | en | a 7B parameters causal decoder-only model built byTII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. |
TII | Falcon-40B | TII Falcon LLM License | multi | a 40B parameters causal decoder-only model built byTII and trained on 1,000B tokens of RefinedWeb enhanced with curated corpora. |
TigerResearch | TigerBot | Apache-2.0 | en/zh | a multi-language and multitask LLM. |
BAAI | Aquila | BAAI_Aquila_Model_License | en/zh | The Aquila language model inherits the architectural design advantages of GPT-3 and LLaMA, replacing a batch of more efficient underlying operator implementations and redesigning the tokenizer for Chinese-English bilingual support. |
OpenBMB | CPM-Bee | 通用模型许可协议-来源说明-宣传限制-商业授权 | en/zh | CPM-Bee is a fully open-source, commercially-usable Chinese-English bilingual base model with a capacity of ten billion parameters. And has been pre-trained on an extensive corpus of trillion-scale tokens. |
Baichuan | baichuan-7B | Apache-2.0 | en/zh | It has achieved the best performance among models of the same size on standard Chinese and English authoritative benchmarks (C-EVAL, MMLU, etc). |
Tencent | lyraChatGLM | MIT License | en/zh | To the best of our knowledge, it is thefirst accelerated version of ChatGLM-6B. The inference speed of lyraChatGLM has achieved 300x acceleration upon the early original version. We are still working hard to further improve the performance. |
SalesForce | XGen | Apache-2.0 | multi | Salesforce open-source LLMs with 8k sequence length |
Shanghai AI Lab | InternLM | Apache-2.0 | en/zh | InternLM has open-sourced a 7 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics: It leverages trillions of high-quality tokens for training to establish a powerful knowledge base. It supports an 8k context window length, enabling longer input sequences and stronger reasoning capabilities. It provides a versatile toolset for users to flexibly build their own workflows. |
xverse-ai | XVERSE-13B | Apache-2.0 | multi | A multilingual large language model developed by XVERSE Technology Inc. |
Domain Models
contributor | model | domain | language | base model | main feature |
---|---|---|---|---|---|
UT Southwestern/ UIUC/OSU/HDU |
ChatDoctor | medical | en | LLaMA | Maybe the first domain-specific chat model tuned on LLaMA. |
Cambridge | Visual Med-Alpaca | biomedical | en | LLaMA-7B | a multi-modal foundation model designed specifically for the biomedical domain. |
HIT | BenTsao / ChatGLM-Med | medical | zh | LLaMA/ChatGLM | fine-tuned with Chinese medical knowledge dataset, which is generated by using gpt3.5 api. |
ShanghaiTech, etc. | DoctorGLM | medical | en/zh | ChatGLM-6B | Chinese medical consultation model fine-tuned on ChatGLM-6B. |
THU AIR | BioMedGPT-1.6B | biomedical | en/zh | - | a pre-trained multi-modal molecular foundation model with 1.6B parameters that associates 2D molecular graphs with texts. |
@LiuHC0428 | LawGPT_zh | legal | zh | ChatGLM-6B | a general model in Chinese legal domain, trained on data generated via Reliable-Self-Instruction. |
SJTU | MedicalGPT-zh | medical | zh | ChatGLM-6B | a general model in Chinese medical domain, a diverse data generated via self-instruct. |
SJTU | PMC-LLaMA | medical | zh | LLaMA | Continue Training LLaMA on Medical Papers. |
HuggingFace | StarCoder | code generation | en | - | a language model (LM) trained on source code and natural language text. Its training data incorporates more than 80 different programming languages as well as text extracted from GitHub issues and commits and from notebooks. |
@CogStack | NHS-LLM | medical | en | not clear | A conversational model for healthcare trained usingOpenGPT. |
@pengxiao-song | LaWGPT | legal | zh | LLaMA/ChatGLM | expand the vocab with Chinese legal terminologies, instruction fine-tuned on data generated using self-instruct. |
Duxiaoman | XuanYuan | finance | zh | BLOOM-176B | A Large Chinese Financial Chat Model with Hundreds of Billions Parameters. |
CUHK | HuatuoGPT | medical | zh | not clear | HuatuoGPT, a large language model (LLM) trained on a vast Chinese medical corpus. Our objective with HuatuoGPT is to construct a more professional ‘ChatGPT’ for medical consultation scenarios. |
PKU | Lawyer LLaMA | legal | zh | LLaMA | continue pretraining on Chinese legal data, insturction tuned on legal exams and legal consulting qa pairs. |
THU | LexiLaw | legal | zh | ChatGLM-6B | trained on a mixture of general data (BELLE 1.5M) and legal data |
THU, etc. | taoli | education | zh | LLaMA | A large model for international Chinese education. It extends specific vocabulary on the base model, and uses the domain's proprietary data set for instruction fine-tuning. |
NUS | Goat | arithmetic | en | LLaMA | a fine-tuned LLaMA model that significantly outperforms GPT-4 on a range of arithmetic tasks. Fine-tuned on a synthetically generated dataset, Goat achieves state-ofthe-art performance on BIG-bench arithmetic sub-task. |
CU/NYU | FinGPT | finance | en | - | an end-to-end open-source framework for financial large language models (FinLLMs). |
microsoft | WizardCoder | code generation | en | StarCoder | trained with78k evolved code instructions. surpasses Claude-Plus (+6.8) , Bard (+15.3) and InstructCodeT5+ (+22.3) on the HumanEval Benchmarks. |
UCAS | Cornucopia | finance | zh | LLaMA | finetune LLaMA on Chinese financial knowledge, |
PKU | ChatLaw | legal | zh | Ziya / Anima | Chinese legal domain model. |
@michael-wzhu | ChatMed | medical | zh | LLaMA | Chinese medical LLM based on LLaMA-7B. |
SCUT | SoulChat | mental health | zh | ChatGLM-6B | Chinese dialogue LLM in mental health domain, based on ChatGLM-6B. |
@shibing624 | MedicalGPT | medical | zh | ChatGLM-6B | Training Your Own Medical GPT Model with ChatGPT Training Pipeline. |
BJTU | TransGPT | transportation | zh | LLaMA-7B | Chinese transportation model. |
BAAI | AquilaCode | code generation | multi | Aquila | AquilaCode-multi is a multi-language model that supports high-accuracy code generation for various programming languages, including Python/C++/Java/Javascript/Go, etc. It has achieved impressive results in HumanEval (Python) evaluation, with Pass@1, Pass@10, and Pass@100 scores of 26/45.7/71.6, respectively. In the HumanEval-X multi-language code generation evaluation, it significantly outperforms other open-source models with similar parameters (as of July 19, 2023). AquilaCode-py, on the other hand, is a single-language Python version of the model that focuses on Python code generation. It has also demonstrated excellent performance in HumanEval evaluation, with Pass@1, Pass@10, and Pass@100 scores of 28.8/50.6/76.9 (as of July 19, 2023). |
General Domain Instruction Models
contributor | model/project | language | base model | main feature |
---|---|---|---|---|
Stanford | Alpaca | en | LLaMA/OPT | use 52K instruction-following data generated by Self-Instructt techniques to fine-tune 7B LLaMA, the resulting model, Alpaca, behaves similarly to the text-davinci-003 model on the Self-Instruct instruction-following evaluation suite.Alpaca has inspired many follow-up models. |
LianJiaTech | BELLE | en/zh | BLOOMZ-7B1-mt | maybe the first Chinese model to follow Alpaca. |
THU | ChatGLM-6B | en/zh | - | well-known Chinese model. |
Databricks | Dolly | en | GPT-J 6B | use Alpaca data to fine-tune a 2-year-old model: GPT-J, which exhibits surprisingly high quality instruction following behavior not characteristic of the foundation model on which it is based. |
@tloen | Alpaca-LoRA | en | LLaMA-7B | trained within hours on a single RTX 4090, reproducing the Stanford Alpaca results using low-rank adaptation (LoRA), and can run on a Raspberry pi. |
ColossalAI | Coati7B | en/zh | LLaMA-7B | a large language model developed by the ColossalChat project |
Shanghai AI Lab | LLaMA-Adapter | en | LLaMA-7B | Fine-tuning LLaMA to follow instructions within 1 Hour and 1.2M Parameters |
AetherCortex | Llama-X | en | LLaMA | Open Academic Research on Improving LLaMA to SOTA LLM. |
TogetherComputer | OpenChatKit | en | GPT-NeoX-20B | OpenChatKit provides a powerful, open-source base to create both specialized and general purpose chatbots for various applications. The kit includes an instruction-tuned language models, a moderation model, and an extensible retrieval system for including up-to-date responses from custom repositories. |
nomic-ai | GPT4All | en | LLaMA | trained on a massive collection of clean assistant data including code, stories and dialogue |
@ymcui | Chinese-LLaMA-Alpaca | en/zh | LLaMA-7B/13B | expand the Chinese vocabulary based on the original LLaMA and use Chinese data for secondary pre-training, further enhancing Chinese basic semantic understanding. Additionally, the project uses Chinese instruction data for fine-tuning on the basis of the Chinese LLaMA, significantly improving the model's understanding and execution of instructions. |
UC Berkley Stanford CMU |
Vicuna | en | LLaMA-13B | Impressing GPT-4 with 90% ChatGPT Quality. |
UCSD/SYSU | baize | en/zh | LLaMA | fine-tuned withLoRA. It uses 100k dialogs generated by letting ChatGPT chat with itself. Alpaca's data is also used to improve its performance. |
UC Berkley | Koala | en | LLaMA | Rather than maximizingquantity by scraping as much web data as possible, the team focus on collecting a small high-quality dataset. |
@imClumsyPanda | langchain-ChatGLM | en/zh | ChatGLM-6B | local knowledge based ChatGLM with langchain. |
@yangjianxin1 | Firefly | zh | bloom-1b4-zh bloom-2b6-zh |
Instruction Tuning on Chinese dataset. Vocabulary pruning, ZeRO, and tensor parallelism are used to effectively reduce memory consumption and improve training efficiency. |
microsoft | GPT-4-LLM | en/zh | LLaMA | aims to share data generated by GPT-4 for building an instruction-following LLMs with supervised learning and reinforcement learning. |
Hugging Face | StackLLaMA | en | LLaMA | trained on StackExchange data and the main goal is to serve as a tutorial and walkthrough on how to train model with RLHF and not primarily model performance. |
Nebuly | ChatLLaMA | en | - | a library that allows you to create hyper-personalized ChatGPT-like assistants using your own data and the least amount of compute possible. |
@juncongmoo | ChatLLaMA | en | LLaMA | LLaMA-based RLHF model, runnable in a single GPU. |
@juncongmoo | minichatgpt | en | GPT/OPT ... | To Train ChatGPT In 5 Minutes with ColossalAI. |
@LC1332 | Luotuo-Chinese-LLM | zh | LLaMA/ChatGLM | Instruction fine-tuned Chinese Language Models, with colab provided! |
@Facico | Chinese-Vicuna | zh | LLaMA | A Chinese Instruction-following LLaMA-based Model, fine-tuned with Lora, cpp inference supported, colab provided. |
@yanqiangmiffy | InstructGLM | en/zh | ChatGLM-6B | ChatGLM based instruction-following model, fine-tuned on a variety of data sources, supports deepspeed accelerating and LoRA. |
alibaba | Wombat | en | LLaMA | a novel learning paradigm called RRHF, as an alternative of RLHF, is proposed, which scores responses generated by different sampling policies and learns to align them with human preferences through ranking loss. And the performance is comparable to RLHF, with less models used in the process. |
@WuJunde | alpaca-glassoff | en | LLaMA | a mini image-acceptable Chat AI can run on your own laptop, based onstanford-alpaca and alpaca-lora. |
@JosephusCheung | Guanaco | multi | LLaMA-7B | A Multilingual Instruction-Following Language Model. |
@FreedomIntelligence | LLM Zoo | multi | BLOOMZ/LLaMA | a project that provides data, models, and evaluation benchmark for large language models. model released: Phoenix, Chimera |
SZU | Linly | en/zh | LLaMA | expand the Chinese vocabulary, full fine-tuned models, largest LLaMA-based Chinese models, aggregation of Chinese instruction data, reproduceable details.. |
@lamini-ai | lamini | multi | - | data generator for generating instructions to train instruction-following LLMs. |
Stability-AI | StableVicuna | en | LLaMA | a further instruction fine tuned and RLHF trained version of Vicuna v0 13b, with better performance than Vicuna. |
Hugging Face | HuggingChat | en | LLaMA | seems to be the first one available to access as a platform that appears similar to ChatGPT. |
microsoft | WizardLM | en | LLaMA | trained with 70k evolved instructions,Evol-Instruct is a novel method using LLMs instead of humans to automatically mass-produce open-domain instructions of various difficulty levels and skills range, to improve the performance of LLMs. |
FDU | OpenChineseLLaMA | en/zh | LLaMA-7B | further pretrain LLaMA on Chinese data, improving LLaMA preformance on Chinese tasks. |
@chenfeng357 | open-Chinese-ChatLLaMA | en/zh | LLaMA | The complete training code of the open-source Chinese-Llama model, including the full process from pre-training instructing and RLHF. |
@FSoft-AI4Code | CodeCapybara | en | LLaMA | Open Source LLaMA Model that Follow Instruction-Tuning for Code Generation. |
@mbzuai-nlp | LaMini-LM | en | LLaMA/Flan-T5 ... | A Diverse Herd of Distilled Models from Large-Scale Instructions. |
NTU | Panda | en/zh | LLaMA | further pretraining on Chinese data, full-size of LLaMA models. |
IBM/CMU/MIT | Dromedary | en | LLaMA-65B | Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision. |
@melodysdreamj | WizardVicunaLM | multi | Vicuna | Wizard's dataset + ChatGPT's conversation extension + Vicuna's tuning method, achieving approximately 7% performance improvement over Vicuna. |
sambanovasystems | BLOOMChat | multi | BLOOM | BLOOMChat is a 176 billion parameter multilingual chat model. It is instruction tuned fromBLOOM (176B) on assistant-style conversation datasets and supports conversation, question answering and generative answers in multiple languages. |
TII | Falcon-7B-Instruct | en | Falcon-7B | a 7B parameters causal decoder-only model built byTII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets. |
TII | Falcon-40B-Instruct | multi | Falcon-40B | a 40B parameters causal decoder-only model built byTII based on Falcon-40B and finetuned on a mixture of Baize. |
USTC, etc. | ExpertLLaMA | en | LLaMA | use In-Context Learning to automatically write customized expert identity and find the quality quite satisfying. We then prepend corresponding expert identity to each instruction to produce augmented instruction-following data. We refer to the overall framework as ExpertPrompting, find more details in our paper. |
ZJU | CaMA | en/zh | LLaMA | further pretrained on Chinese courpus without expansion of vocabulary; optimized on the Information Extraction (IE) tasks. pre-training script is available, which includes transformations, construction, and loading of large-scale corpora, as well as the LoRA instruction fine-tuning script. |
THU | UltraChat | en | LLaMA | First, the UltraChat dataset provides a rich resource for the training of chatbots. Second, by fine-tuning the LLaMA model, the researchers successfully created a dialogue model UltraLLaMA with superior performance. |
RUC | YuLan-Chat | en/zh | LLaMA | developed based on fine-tuning LLaMA with high-quality English and Chinese instructions. |
AI2 | Tülu | en | LLaMA/Pythia/OPT | a suite of LLaMa models fully-finetuned on a strong mix of datasets. |
KAIST | SelFee | en | LLaMA | Iterative Self-Revising LLM Empowered by Self-Feedback Generation. |
@lyogavin | Anima | en/zh | LLaMA | trained based on QLoRA's33B guanaco, finetuned for 10000 steps. |
THU | ChatGLM2-6B | en/zh | - | ChatGLM2 -6B is the second-generation version of the open-source bilingual (Chinese-English) chat model ChatGLM-6B. It retains the smooth conversation flow and low deployment threshold of the first-generation model, while introducing the following new features: - Stronger Performance - Longer Context - More Efficient Inference- More Open License |
OpenChat | OpenChat | en | LLaMA, etc. | a series of open-source language models fine-tuned on a small, yet diverse and high-quality dataset of multi-round conversations. Specifically, we utilize only ~6K GPT-4 conversations directly filtered from the ~90K ShareGPT conversations. Despite the small size of the dataset, OpenLLMs has demonstrated remarkable performance. |
CAS | BayLing | multi | LLaMA | BayLing is an English/Chinese LLM equipped with advanced language alignment, showing superior capability in English/Chinese generation, instruction following and multi-turn interaction. |
stabilityai | FreeWilly/FreeWilly2 | en | LLaMA/LLaMA2 | FreeWilly is a Llama65B model fine-tuned on an Orca style Dataset.FreeWilly2 is a Llama2 70B model finetuned on an Orca style Dataset.FreeWilly2 outperforms Llama2 70B on the huggingface Open LLM leaderboard. |
alibaba | Qwen-7B | en/zh | - | 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. |
ZJU | KnowLM | en/zh | LLaMA | With the rapid development of deep learning technology, large language models such as ChatGPT have made substantial strides in the realm of natural language processing. However, these expansive models still encounter several challenges in acquiring and comprehending knowledge, including the difficulty of updating knowledge and potential knowledge discrepancies and biases, collectively known asknowledge fallacies . The KnowLM project endeavors to tackle these issues by launching an open-source large-scale knowledgable language model framework and releasing corresponding models. |
NEU | TechGPT | en/zh | LLAMA | TechGPT mainly strengthens the following three types of tasks: - Various information extraction tasks such as relation triplet extraction with "knowledge graph construction" as the core - Various intelligent question-and-answer tasks centered on "reading comprehension". - Various sequence generation tasks such as keyword generation with "text understanding" as the core. |
@MiuLab | Taiwan-LLaMa | en/zh | LLaMA2 | Traditional Chinese LLMs for Taiwan |
Alternatives To Transformer
(maybe successors?)
contributor | method | main feature |
---|---|---|
BlinkDL | RWKV-LM | RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding. |
msra | RetNet | simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-costO(1) inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation facilitates efficient long-sequence modeling with linear complexity, where each chunk is encoded parallelly while recurrently summarizing the chunks. Experimental results on language modeling show that RetNet achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference. The intriguing properties make RetNet a strong successor to Transformer for large language models. |
stanford | Bapcpack | A**Backpack** is a drop-in replacement for a Transformer that provides new tools for interpretability-through-control while still enabling strong language models. Backpacks decompose the predictive meaning of words into components non-contextually, and aggregate them by a weighted sum, allowing for precise, predictable interventions. |
Multi-Modal
contributor | project | language | base model | main feature |
---|---|---|---|---|
BaihaiAIen/zh | IDPChat | en/zh | LLaMA-13B Stable Diffusion |
Open Chinese multi-modal model, single GPU runnable, easy to deploy, UI provided. |
KAUST | MiniGPT-4 | en/zh | LLaMA | MiniGPT-4 aligns a frozen visual encoder from BLIP-2 with a frozen LLM, Vicuna, using just one projection layer, and yields many emerging vision-language capabilities similar to those demonstrated in GPT-4. |
UW–Madison/MSR /Columbia University |
LLaVA | en | LLaMA | visual instruction tuning is proposed, towards building large language and vision models with GPT-4 level capabilities. |
NUS/THU | VPGTrans | en | LLaMA/OPT/ Flan-T5/BLIP-2 ... |
transferring VPG across LLMs to build VL-LLMs at significantly lower cost. The GPU hours can be reduced over 10 times and the training data can be reduced to around 10%. Two novel VL-LLMs are released via VPGTrans, including VL-LLaMA and VL-Vicuna. VL-LLaMA is a multimodal version LLaMA by transferring the BLIP-2 OPT-6.7B to LLaMA via VPGTrans. VL-Vicuna is a GPT-4-like multimodal chatbot, based on the Vicuna LLM. |
CAS, etc | X-LLM | en/zh | ChatGLM-6B | X-LLM converts multi-modalities (images, speech, videos) into foreign languages using X2L interfaces and feed them into a large Language Model (ChatGLM) to accomplish a Multimodal LLM, achieving impressive multimodal chat capabilities. |
NTU | Otter | en | OpenFlamingo | a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following ability and in-context learning. Futhermore, optimize OpenFlamingo's implementation, democratizing the required training resources from 1x A100 GPU to 4x RTX-3090 GPUs. |
XMU | LaVIN | en | LLaMA | propose a novel and affordable solution for vision-language instruction tuning, namely Mixture-of-Modality Adaptation (MMA). Particularly, MMA is an end-to-end optimization regime, which connects the image encoder and LLM via lightweight adapters. Meanwhile, we also propose a novel routing algorithm in MMA, which can help the model automatically shifts the reasoning paths for single- and multi-modal instructions. |
see also: awesome-Multimodal-Large-Language-Models
Data
Pretrain Data
contributor | data | language | main feature |
---|---|---|---|
TogetherComputer | RedPajama-Data | en | An Open Source Recipe to Reproduce LLaMA training dataset. |
Instruction Data
see Alpaca-CoT data collection
contributor | data | language | main feature |
---|---|---|---|
salesforce | DialogStudio | en | DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection and Instruction-Aware Models for Conversational AI. |
Synthetic Data Generation
contributor | method | main feature |
---|---|---|
UW, etc. | self-instruct | using the model's own generations to create a large collection of instructional data. |
@LiuHC0428 | Reliable-Self-Instruction | use ChatGPT to generate some questions and answers based on a given text. |
PKU | Evol-Instruct | a novel method, proposed inWizardLM, by using LLMs instead of humans to automatically mass-produce open-domain instructions of various difficulty levels and skills range, to improve the performance of LLMs. |
KAUST, etc. | CAMEL | a novel communicative agent framework namedrole-playing is proposed, which involves using inception prompting to guide chat agents toward task completion while maintaining consistency with human intentions. role-playing can be used to generate conversational data in a specific task/domain. |
@chatarena | ChatArena | a library that provides multi-agent language game environments and facilitates research about autonomous LLM agents and their social interactions. it provides a flexible framework to define multiple players, environments and the interactions between them, based on Markov Decision Process. |
Evaluation
contributor | method | main feature |
---|---|---|
- | human evalation | - |
OpenAI | GPT-4/ChatGPT | - |
PKU/CMU/MSRA ... | PandaLM | Reproducible and Automated Language Model Assessment. |
UCB | Chatbot Arena | Chat with two anonymous models side-by-side and vote for which one is better, then use the Elo rating system to calculate the relative performance of the models. |
Stanford | AlpacaEval | GPT-4/Claude evaluation onAlpacaFarm dataset. |
clueai | SuperCLUElyb | Chinese version ofChatbot Arena developed by clueai. |
Framework/ToolKit/Platform
contributor | project | main feature |
---|---|---|
CAS | Alpaca-CoT | extend CoT data to Alpaca to boost its reasoning ability. aims at building an instruction finetuning (IFT) platform with extensive instruction collection (especially the CoT datasets) and a unified interface for various large language models. |
@hiyouga | ChatGLM-Efficient-Tuning | efficient fine-tuning ChatGLM-6B with PEFT. |
@hiyouga | LLaMA-Efficient-Tuning | Fine-tuning LLaMA with PEFT (PT+SFT+RLHF with QLoRA). |
@jianzhnie | Efficient-Tuning-LLMs | Efficient Finetuning of QLoRA LLMs. |
ColossalAI | ColossalChat | An open-source low cost solution for cloningChatGPT with a complete RLHF pipeline. |
microsoft | deepspeed-chat | Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales. |
LAION-AI | Open Assistant | a project meant to give everyone access to a great chat based large language model. |
HKUST | LMFlow | an extensible, convenient, and efficient toolbox for finetuning large machine learning models, designed to be user-friendly, speedy and reliable, and accessible to the entire community. |
UCB | EasyLM | EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax. EasyLM can scale up LLM training to hundreds of TPU/GPU accelerators by leveraging JAX's pjit functionality. |
@CogStack | OpenGPT | A framework for creating grounded instruction based datasets and training conversational domain expert Large Language Models (LLMs). |
HugAILab | HugNLP | a unified and comprehensive NLP library based on HuggingFace Transformer. |
ProjectD-AI | LLaMA-Megatron-DeepSpeed | Ongoing research training transformer language models at scale, including: BERT & GPT-2. |
@PanQiWei | AutoGPTQ | An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm. |
Alignment
contributor | method | used in | main feature |
---|---|---|---|
- | IFT | ChatGPT | Instruction Fine-Tuning. |
- | RLHF | ChatGPT | RL from Human Feedback. |
Anthropic | RLAIF | Claude | RL from AI Feedback. |
alibaba | RRHF | Wombat | a novel learning paradigm called RRHF, as an alternative of RLHF, is proposed, which scores responses generated by different sampling policies and learns to align them with human preferences through ranking loss. And the performance is comparable to RLHF, with less models used in the process. |
HKUST | RAFT | - | RAFT is a new alignment algorithm, which is more efficient than conventional (PPO-based) RLHF. |
IBM/CMU/MIT | SELF-ALIGN | Dromedary | combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision. |
PKU | CVA | Beaver | Constrained Value Alignment via Safe RLHF. |
tencent | RLTF | - | Reinforcement Learning from Unit Test Feedback. |
Multi-Language
vocabulary expansion
according to the official FAQ in LLaMA repo, there's not many tokens other than latin languages, so one of the efforts is to expand the vocabulary, some works are shown below:
contributor | model/project | language | base model | main feature |
---|---|---|---|---|
@ymcui | Chinese-LLaMA-Alpaca | zh | LLaMA | |
SZU | Linly | en/zh | LLaMA | full-size LLaMA, further pretrained on Chineses Corpus. |
@Neutralzz | BiLLa | en/zh | LLaMA-7B | further pretrained onWudao、PILE、WMT. |
@pengxiao-song | LaWGPT | zh | LLaMA/ChatGLM | expand the vocab with Chinese legal terminologies, instruction fine-tuned on data generated using self-instruct. |
IDEA | Ziya | en/zh | LLaMA | large-scale pre-trained model based on LLaMA with 13 billion parameters. We optimizes LLaMAtokenizer on chinese, and incrementally train 110 billion tokens of data based on LLaMa-13B model, which significantly improved the understanding and generation ability on Chinese. |
OpenBuddy | OpenBuddy | multi | LLaMA/Falcon ... | Built upon Tii's Falcon model and Facebook's LLaMA model, OpenBuddy is fine-tuned to include an extended vocabulary, additional common characters, and enhanced token embeddings. By leveraging these improvements and multi-turn dialogue datasets, OpenBuddy offers a robust model capable of answering questions and performing translation tasks across various languages. |
FDU | CuteGPT | en/zh | LLaMA | CuteGPT expands the Chinese vocabulary and performs pre-training on the Llama model, improving its ability to understand Chinese. Subsequently, it is fine-tuned with conversational instructions to enhance the model's ability to understand instructions. |
Efficient Training/Fine-Tuning
contributor | method | main feature |
---|---|---|
microsoft | LoRA | Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. |
stanford | Prefix Tuning | a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen and instead optimizes a sequence of continuous task-specific vectors, which we call the prefix. |
THU | P-Tuning | P-tuning leverages few continuous free parameters to serve as prompts fed as the input to the pre-trained language models. We then optimize the continuous prompts using gradient descent as an alternative to discrete prompt searching. |
THU/BAAI/ Shanghai Qi Zhi Institute |
P-Tuning v2 | a novel empirical finding that properly optimized prompt tuning can be comparable to fine-tuning universally across various model scales and NLU tasks. Technically, P-tuning v2 is not conceptually novel. It can be viewed as an optimized and adapted implementation of Deep Prompt Tuning. |
Prompt Tuning | a simple yet effective mechanism for learning "soft prompts" to condition frozen language models to perform specific downstream tasks. Prompt Tuning can be seen as a simplification of "prefix tuning". |
|
GT/Princeton/microsoft | AdaLoRA | adaptively allocates the parameter budget among weight matrices according to their importance score. In particular, AdaLoRA parameterizes the incremental updates in the form of singular value decomposition. |
UW | QLoRA | an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters (LoRA). |
FDU | LOMO | a new optimizer,LOw-Memory Optimization ( LOMO ), which fuses the gradient computation and the parameter update in one step to reduce memory usage, which enables the full parameter fine-tuning of a 7B model on a single RTX 3090, or a 65B model on a single machine with 8×RTX 3090, each with 24GB memory. |
MBZUAI, Transmute AI Lab Meta, CMU |
GLoRA | Enhancing Low-Rank Adaptation (LoRA), GLoRA employs a generalized prompt module to optimize pre-trained model weights and adjust intermediate activations, providing more flexibility and capability across diverse tasks and datasets. |
acknowledgement: HuggingFace Peft
Low-Cost Inference
quantization
contributor | algorithm | main feature |
---|---|---|
UW, etc. | SpQR | a new compressed format and quantization technique which enables for the first time near-lossless compression of LLMs across model scales, while reaching similar compression levels to previous methods. |
THU | Train_Transformers_with_INT4 | For forward propagation, we identify the challenge of outliers and propose a Hadamard quantizer to suppress the outliers. For backpropagation, we leverage the structural sparsity of gradients by proposing bit splitting and leverage score sampling techniques to quantize gradients accurately. |
INTEL | neural-compressor | targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance. |
projects
contributor | project | main feature |
---|---|---|
@ggerganov | llama.cpp | c/cpp implementation for llama and some other models, using quantization. |
@NouamaneTazi | bloomz.cpp | C++ implementation for BLOOM inference. |
@mlc-ai | MLC LLM | a universal solution that allows any language models to be deployed natively on a diverse set of hardware backends and native applications, plus a productive framework for everyone to further optimize model performance for their own use cases. |
alibaba | ChatGLM-MNN | converts the ChatGLM-6B model to MNN and performs inference using C++. |
Jittor | JittorLLMs | Significantly reduce hardware costs (by 80%), currently known as the lowest-cost deployment library, supports multiple platforms. |
OpenBMB | BMInf | BMInf supports running models with more than 10 billion parameters on a single NVIDIA GTX 1060 GPU in its minimum requirements. In cases where the GPU memory supports the large model inference (such as V100 or A100), BMInf still has a significant performance improvement over the existing PyTorch implementation. |
hpcaitech | EnergonAI | With tensor parallel operations, pipeline parallel wrapper, distributed checkpoint loading, and customized CUDA kernel, EnergonAI can enable efficient parallel inference for larges-scale models. |
MegEngine | InferLLM | a lightweight LLM model inference framework that mainly references and borrows fromthe llama.cpp project. llama.cpp puts almost all core code and kernels in a single file and use a large number of macros, making it difficult for developers to read and modify. |
@saharNooby | rwkv.cpp | a port ofBlinkDL/RWKV-LM to ggerganov/ggml. |
FMInference | FlexGen | FlexGen is a high-throughput generation engine for running large language models with limited GPU memory. FlexGen allowshigh-throughput generation by IO-efficient offloading, compression, and large effective batch sizes . |
huggingface bigcode-project |
starcoder.cpp | C++ implemention for 💫 StarCoder inference using theggml library. |
CMU | SpecInfer | SpecInfer is an open-source distributed multi-GPU system that accelerates generative LLM inference withspeculative inference and token tree verification. A key insight behind SpecInfer is to combine various collectively boost-tuned small speculative models (SSMs) to jointly predict the LLM’s outputs. |
@ztxz16 | fastllm | full-platform pure c++ llm acceleration library, supports moss, chatglm, baichuan models, runs smoothly on mobile phones. |
UCB | vllm | a fast and easy-to-use library for LLM inference and serving. fast with Efficient management of attention key and value memory withPagedAttention. |
stanford | mpt-30B-inference | Run inference on the latest MPT-30B model using your CPU. This inference code uses aggml quantized model. |
Shanghai AI Lab | lmdeploy | a toolkit for compressing, deploying, and serving LLM. |
Safety
contributor | method | main feature |
---|---|---|
thu-coai | Safety-Prompts | Chinese safety prompts for evaluating and improving the safety of LLMs. |
Truthfulness
contributor | method | main feature |
---|---|---|
Harvard | ITI | ITI operates by shifting model activations during inference, following a set of directions across a limited number of attention heads. This intervention significantly improves the performance of LLaMA models on the TruthfulQA benchmark. On an instruction-finetuned LLaMA called Alpaca, ITI improves its truthfulness from 32.5 to 65.1. |
Extend Context Window
contributor | method | main feature |
---|---|---|
UW, etc. | ALiBi | Instead of adding position embeddings at the bottom of the transformer stack, ALiBi adds a linear bias to each attention score, allowing the model to be trained on, for example, 1024 tokens, and then do inference on 2048 (or much more) tokens without any finetuning. |
DeepPavlov, etc. | RMT | use a recurrent memory to extend the context length. |
bytedance | SCM | unleash infinite-length input capacity for large-scale language models. |
Meta | Position Interpolation | extends the context window sizes of RoPE-based pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within 1000 steps). Position Interpolation linearly down-scales the input position indices to match the original context window size, rather than extrapolating beyond the trained context length which may lead to catastrophically high attention scores that completely ruin the self-attention mechanism. |
UCB | LongChat | Instead of forcing the LLaMA model to adapt to position_ids > 2048, we condense position_ids > 2048 to be within 0 to 2048 (the same machenism asPosition Interpolation, surprisingly!). we observed that our LongChat-13B-16K model reliably retrieves the first topic, with comparable accuracy to gpt-3.5-turbo. |
microsoft | LongNet | replaces the attention of vanilla Transformers with a novel component nameddilated attention, and successfully scale the sequence length to 1 billion tokens. |
IDEAS NCBR, etc. | LongLLaMA | LongLLaMA is built upon the foundation ofOpenLLaMA and fine-tuned using the Focused Transformer (FoT) method, and is capable of handling long contexts of 256k tokens or even more. |
Knowledge Editing
Must-read Papers on Model Editing: ModelEditingPapers
contributor | method | main feature |
---|---|---|
MIT, etc. | ROME | First, we trace the causal effects of hidden state activations within GPT using causal mediation analysis to identify the specific modules that mediate recall of a fact about a subject. Our analysis reveals that feedforward MLPs at a range of middle layers are decisive when processing the last token of the subject name. Second, we test this finding in model weights by introducing a Rank-One Model Editing method (ROME) to alter the parameters that determine a feedfoward layer’s behavior at the decisive token. Despite the simplicity of the intervention, we find that ROME is similarly effective to other modelediting approaches on a standard zero-shot relation extraction benchmark. |
Implementations
contributor | project | main feature |
---|---|---|
PKU | FastEdit | injecting fresh and customized knowledge into large language models efficiently using one single command. |
ZJU | EasyEdit | a Python package for edit Large Language Models (LLM) like GPT-J , Llama , GPT-NEO , GPT2 , T5 (support models from 1B to 65B ), the objective of which is to alter the behavior of LLMs efficiently within a specific domain without negatively impacting performance across other inputs. It is designed to be easy to use and easy to extend. |
External Knowledge
allowing the model to access external knowledge, such as KG、databases.
contributor | project | main feature |
---|---|---|
@jerryjliu | LlamaIndex | provides a central interface to connect your LLM's with external data. |
@imClumsyPanda | langchain-ChatGLM | local knowledge based ChatGLM withlangchain. |
@wenda-LLM | wenda | an LLM calling platform designed to find and design automatic execution actions for small model plug-in knowledge bases to achieve the same generation ability as large models. |
@csunny | DB-GPT | build a complete private large model solution for all database-based scenarios. |
THU, BAAI, ZJU | ChatDB | a novel framework integrating symbolic memory with LLMs. ChatDB explores ways of augmenting LLMs with symbolic memory to handle contexts of arbitrary lengths. Such a symbolic memory framework is instantiated as an LLM with a set of SQL databases. The LLM generates SQL instructions to manipulate the SQL databases autonomously (including insertion, selection, update, and deletion), aiming to complete a complex task requiring multi-hop reasoning and long-term symbolic memory. |
External Tools
Using Existing Tools
allowing the model to access external tools, such as search engine、api.
contributor | project | base model | main feature |
---|---|---|---|
UCB/microsoft | Gorilla | LLaMA | invokes 1,600+ (and growing) API calls accurately while reducing hallucination. |
THU | ToolLLaMA | LLaMA | This project aims to constructopen-source, large-scale, high-quality instruction tuning SFT data to facilitate the construction of powerful LLMs with general tool-use capability. We provide the dataset, the corresponding training and evaluation scripts, and a capable model ToolLLaMA fine-tuned on ToolBench. |
Make New Tools
contributor | project | main feature |
---|---|---|
Google, etc. | LATM | LLMs create their own reusable tools for problem-solving. |
Autonomus Problem Solving
contributor | project | driven by | main feature |
---|---|---|---|
@Significant-Gravitas | Auto-GPT | GPT-4 | chains together LLM "thoughts", to autonomously achieve whatever goal you set. |
@yoheinakajima | BabyAGI | GPT | The main idea behind this system is that it creates tasks based on the result of previous tasks and a predefined objective. The script then uses OpenAI's natural language processing (NLP) capabilities to create new tasks based on the objective, and Chroma/Weaviate to store and retrieve task results for context. |
microsoft | HuggingGPT | GPT-4 | Language serves as an interface for LLMs to connect numerous AI models for solving complicated AI tasks! |
microsoft/NCSU | ReWOO | - | detaches the reasoning process from external observations, thus significantly reducing token consumption. |
Similar Collections
collections of open instruction-following llms |
---|
开源微调大型语言模型(LLM)合集 |
机器之心SOTA!模型 |
Awesome Totally Open Chatgpt |
LLM-Zoo |
Awesome-LLM |
🤗 Open LLM Leaderboard |
Open LLMs |
Awesome-Chinese-LLM |
Awesome Pretrained Chinese NLP Models |
LLMSurvey |