/LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing

LLM-PowerHouse: Unleash LLMs' potential through curated tutorials, best practices, and ready-to-use code for custom training and inferencing.

Primary LanguageJupyter NotebookMIT LicenseMIT

LLM-PowerHouse: A Curated Guide for Large Language Models with Custom Training and Inferencing

Welcome to LLM-PowerHouse, your ultimate resource for unleashing the full potential of Large Language Models (LLMs) with custom training and inferencing. This GitHub repository is a comprehensive and curated guide designed to empower developers, researchers, and enthusiasts to harness the true capabilities of LLMs and build intelligent applications that push the boundaries of natural language understanding.

Table of contents

Prompt Engineering

Prompt engineering is a technique that can be used to improve the performance of LLMs on specific tasks. It involves crafting prompts that help the LLM to generate the desired output. This can be done by providing the model with additional information, such as examples of the desired output, or using specific language the model is likely to understand.

Prompt Engineering can be powerful tool, but it is important to note that it is not a silver bullet. LLMs can still generate incorrect or unexpected output even with the best prompts. As a result, it is important to test the output of LLMs carefully before using them in production.

Fine Tuning

In other hand, fine-tuing is adapting a pre-trained LLM to a specific task or domain by training it further on a smaller, task-specific dataset. This is done by adjusting the model's weights and parameters to minimize the loss function and improve its performance on the task.

Fine-tuning can be more effective way to improve the performance of LLMs on specific tasks that prompt engineering. However, it is also more time-consuming and expensive. As a result, it is important to consider the cost and time involved in fine-tuning before deciding whether to use it.

When to perform

Fine-tuning is typically needed when the task is new or challenging or when the desired output is highly specific. In these cases, prompt engineering may not be able to provide the model with enough information to generate the desired output.

Prompt engineering is typically sufficient when the task is well-defined and the desired output could be more specific. In these cases, prompt engineering can provide the model with the information it needs to generate the desired output.

Difference between Masked Language Model and Causal Language Model

Masked Language Model (MLM) Causal Language Model (CLM)
Training Objective Predict original values of masked tokens within context Generate next token based on preceding tokens
Input Conditioning Includes original context with randomly masked tokens Includes preceding tokens in the sequence
Bidirectionality Bidirectional, considers entire context to predict masked tokens Unidirectional, relies on preceding tokens
Use Cases Text completion, masked word prediction, text understanding Text generation, story completion, language translation
Example Model BERT (Bidirectional Encoder Representations from Transformers) GPT (Generative Pretrained Transformer)
Associated Task Natural Language Understanding (NLU) Natural Language Generation (NLG)
Example Sentence "I want to __ a book." "The cat sat on the __."
Masked Token [MASK] [MASK]
Objective Predict missing word Predict next word in sequence
Possible Predictions Read, buy, borrow, ... mat, chair, table, ...
Context "I want to [MASK] a book." "The cat sat on the [MASK]."
Output Prediction Read mat

Open Source LLM Space for Research Use

Language Model Description Link
Baize Baize is an open-source chat model trained with LoRA. It uses 100k dialogs generated by letting ChatGPT chat with itself. πŸ”—
Koala A Dialogue Model for Academic Research πŸ”—
Dalai The simplest way to run LLaMA on your local machine πŸ”—
LLaMA A foundational, 65-billion-parameter large language model. LLaMA.cpp Lit-LLaMA πŸ”—
ColossalChat LLM trained with RLHF powered by Colossal-AI πŸ”—
Vicuna An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality. πŸ”—
Dolly A cheap-to-build LLM that exhibits a surprising degree of the instruction following capabilities exhibited by ChatGPT πŸ”—
GPT4All Demo, data, and code to train open-source assistant-style large language model based on GPT-J and LLaMa πŸ”—
Alpaca A model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. Alpaca.cpp Alpaca-LoRA πŸ”—

Open Source LLM Space for Commercial Use

Language Model Release Date Checkpoints Paper/Blog Params (B) Context Length Licence Try it
T5 2019/10 T5 & Flan-T5, Flan-T5-xxl (HF) Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer 0.06 - 11 512 Apache 2.0 T5-Large
UL2 2022/10 UL2 & Flan-UL2, Flan-UL2 (HF) UL2 20B: An Open Source Unified Language Learner 20 512, 2048 Apache 2.0
Cerebras-GPT 2023/03 Cerebras-GPT Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models (Paper) 0.111 - 13 2048 Apache 2.0 Cerebras-GPT-1.3B
Open Assistant (Pythia family) 2023/03 OA-Pythia-12B-SFT-8, OA-Pythia-12B-SFT-4, OA-Pythia-12B-SFT-1 Democratizing Large Language Model Alignment 12 2048 Apache 2.0 Pythia-2.8B
Pythia 2023/04 pythia 70M - 12B Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling 0.07 - 12 2048 Apache 2.0
Dolly 2023/04 dolly-v2-12b Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM 3, 7, 12 2048 MIT
DLite 2023/05 dlite-v2-1_5b Announcing DLite V2: Lightweight, Open LLMs That Can Run Anywhere 0.124 - 1.5 1024 Apache 2.0 DLite-v2-1.5B
RWKV 2021/08 RWKV, ChatRWKV The RWKV Language Model (and my LM tricks) 0.1 - 14 infinity (RNN) Apache 2.0
GPT-J-6B 2023/06 GPT-J-6B, GPT4All-J GPT-J-6B: 6B JAX-Based Transformer 6 2048 Apache 2.0
GPT-NeoX-20B 2022/04 GPT-NEOX-20B GPT-NeoX-20B: An Open-Source Autoregressive Language Model 20 2048 Apache 2.0
Bloom 2022/11 Bloom BLOOM: A 176B-Parameter Open-Access Multilingual Language Model 176 2048 OpenRAIL-M v1
StableLM-Alpha 2023/04 StableLM-Alpha Stability AI Launches the First of its StableLM Suite of Language Models 3 - 65 4096 CC BY-SA-4.0
FastChat-T5 2023/04 fastchat-t5-3b-v1.0 We are excited to release FastChat-T5: our compact and commercial-friendly chatbot! 3 512 Apache 2.0
h2oGPT 2023/05 h2oGPT Building the World’s Best Open-Source Large Language Model: H2O.ai’s Journey 12 - 20 256 - 2048 Apache 2.0
MPT-7B 2023/05 MPT-7B, MPT-7B-Instruct Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs 7 84k (ALiBi) Apache 2.0, CC BY-SA-3.0
RedPajama-INCITE 2023/05 RedPajama-INCITE Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models 3 - 7 2048 Apache 2.0 RedPajama-INCITE-Instruct-3B-v1
OpenLLaMA 2023/05 open_llama_3b, open_llama_7b, open_llama_13b OpenLLaMA: An Open Reproduction of LLaMA 3, 7 2048 Apache 2.0 OpenLLaMA-7B-Preview_200bt
Falcon 2023/05 Falcon-40B, Falcon-7B The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only 7, 40 2048 Apache 2.0
MPT-30B 2023/06 MPT-30B, MPT-30B-instruct MPT-30B: Raising the bar for open-source foundation models 30 8192 Apache 2.0, CC BY-SA-3.0 MPT 30B inference code using CPU
LLaMA 2 2023 LLaMA 2 Weights  Llama 2: Open Foundation and Fine-Tuned Chat Models 7 - 70 4096 Custom Free if you have under 700M users and you cannot use LLaMA outputs to train other LLMs besides LLaMA and its derivatives HuggingChat

LLM Training Frameworks

Framework Description Resource
Alpa Alpa is a system for training and serving large-scale neural networks. πŸ”—
DeepSpeed DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective. πŸ”—
Megatron-DeepSpeed DeepSpeed version of NVIDIA's Megatron-LM that adds additional support for several features such as MoE model training, Curriculum Learning, 3D Parallelism, and others. πŸ”—
FairScale FairScale is a PyTorch extension library for high performance and large scale training πŸ”—
Megatron-LM Ongoing research training transformer models at scale. πŸ”—
Colossal-AI Making large AI models cheaper, faster, and more accessible. πŸ”—
BMTrain Efficient Training for Big Models. πŸ”—
Mesh Tensorflow Mesh TensorFlow: Model Parallelism Made Easier. πŸ”—
maxtext A simple, performant and scalable Jax LLM! πŸ”—
gpt-neox An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library. πŸ”—
Trainer API Provides an API for feature-complete training in PyTorch for most standard use cases πŸ”—
Lighting Deep learning framework to train, deploy, and ship AI products Lightning fast πŸ”—

Effective Deployment Strategies for Language Models

Deployment Tools Description Resource
SkyPilot Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU availability, and managed execution -- all with a simple interface. πŸ”—
vLLM A high-throughput and memory-efficient inference and serving engine for LLMs πŸ”—
Text Generation Inference A Rust, Python and gRPC server for text generation inference. Used in production at HuggingFace to power LLMs api-inference widgets. πŸ”—
Haystack an open-source NLP framework that allows you to use LLMs and transformer-based models from Hugging Face, OpenAI and Cohere to interact with your own data. πŸ”—
Sidekick Data integration platform for LLMs. πŸ”—
LangChain Building applications with LLMs through composability πŸ”—
wechat-chatgpt Use ChatGPT On Wechat via wechaty πŸ”—
promptfoo Test your prompts. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality. πŸ”—
Agenta Easily build, version, evaluate and deploy your LLM-powered apps πŸ”—

Courses about LLM

Courses Link
Full Stack DeepLearning's "LLM BootCamp" πŸ”—
Cohere's LLM University - By Luis Serrano, Jay Alammar and Meor Amer πŸ”—
"fast.ai's Part 2 of "Practical Deep Learning for Coders" πŸ”—
Deep Learning Fundamentals by Lightning AI & Sebastian Raschka, PhD. πŸ”—
Hugging Face's NLP Course πŸ”—
DeepLearning.AI ChatGPT Prompt Engineering for Developers πŸ”—
Princeton Understanding Large Language Models πŸ”—
Stanford CS224N-Lecture 11: Prompting, Instruction Finetuning, and RLHF πŸ”—
Stanford CS324-Large Language Models πŸ”—
Stanford CS25-Transformers United V2 πŸ”—
Stanford GPT-3 & Beyond πŸ”—
MIT Introduction to Data-Centric AI πŸ”—
Cohere LLM University to learn about LLMs and NLP πŸ”—
Oreilly Deploying GPT and Large Language Models πŸ”—
edx Professional Certificate in Large Language Models πŸ”—
coursera Natural Language Processing Specialization πŸ”—
Class Central Large Language Models πŸ”—
Rycolab Large Language Models πŸ”—

Tutorials about LLM

Tutorials Author Link
State of GPT Andrej Karpathy πŸ”—
Instruction finetuning and RLHF lecture Hyung Won Chung πŸ”—
Scaling, emergence, and reasoning in large language models Jason Wei πŸ”—
Open Pretrained Transformers Susan Zhang πŸ”—
How Does ChatGPT Work? Ameet Deshpande πŸ”—
GPT in 60 Lines of NumPy Jay Mody πŸ”—
Welcome to the "Big Model" Era: Techniques and Systems to Train and Serve Bigger Models ICML 2022 πŸ”—
Foundational Robustness of Foundation Models NeurIPS 2022 πŸ”—
Let's build GPT: from scratch, in code, spelled out Andrej Karpathy πŸ”— πŸ‘¨β€πŸ’»
Prompt Engineering Guide DAIR.AI πŸ”—
Fine-tune FLAN-T5 XL/XXL using DeepSpeed & Hugging Face Transformers Philipp Schmid πŸ”—
Illustrating Reinforcement Learning from Human Feedback (RLHF) HuggingFace πŸ”—
What Makes a Dialog Agent Useful? HuggingFace πŸ”—
How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources Yao Fu πŸ”—
What Is ChatGPT Doing … and Why Does It Work? Stephen Wolfram πŸ”—
Why did all of the public reproduction of GPT-3 fail? Jingfeng Yang πŸ”—
Pure Rust implementation of a minimal Generative Pretrained Transformer Keyvan Kambakhsh πŸ”—

Codebase Mastery: Building with Perfection with In-Depth Articles

Title Repository
Instruction based data prepare using OpenAI πŸ”—
Optimal Fine-Tuning using the Trainer API: From Training to Model Inference πŸ”—
Efficient Fine-tuning and inference LLMs with PEFT and LoRA πŸ”—
Efficient Fine-tuning and inference LLMs Accelerate πŸ”—
Curated list of of articles πŸ”—

What I am learning

After immersing myself in the recent GenAI text-based language model hype for nearly a month, I have made several observations about its performance on my specific tasks.

Please note that these observations are subjective and specific to my own experiences, and your conclusions may differ.

  • We need a minimum of 7B parameter models (<7B) for optimal natural language understanding performance. Models with fewer parameters result in a significant decrease in performance. However, using models with more than 7 billion parameters requires a GPU with greater than 24GB VRAM (>24GB).
  • Benchmarks can be tricky as different LLMs perform better or worse depending on the task. It is crucial to find the model that works best for your specific use case. In my experience, MPT-7B is still the superior choice compared to Falcon-7B.
  • Prompts change with each model iteration. Therefore, multiple reworks are necessary to adapt to these changes. While there are potential solutions, their effectiveness is still being evaluated.
  • For fine-tuning, you need at least one GPU with greater than 24GB VRAM (>24GB). A GPU with 32GB or 40GB VRAM is recommended.
  • Fine-tuning only the last few layers to speed up LLM training/finetuning may not yield satisfactory results. I have tried this approach, but it didn't work well.
  • Loading 8-bit or 4-bit models can save VRAM. For a 7B model, instead of requiring 16GB, it takes approximately 10GB or less than 6GB, respectively. However, this reduction in VRAM usage comes at the cost of significantly decreased inference speed. It may also result in lower performance in text understanding tasks.
  • Those who are exploring LLM applications for their companies should be aware of licensing considerations. Training a model with another model as a reference and requiring original weights is not advisable for commercial settings.
  • There are three major types of LLMs: basic (like GPT-2/3), chat-enabled, and instruction-enabled. Most of the time, basic models are not usable as they are and require fine-tuning. Chat versions tend to be the best, but they are often not open-source.
  • Not every problem needs to be solved with LLMs. Avoid forcing a solution around LLMs. Similar to the situation with deep reinforcement learning in the past, it is important to find the most appropriate approach.
  • I have tried but didn't use langchains and vector-dbs. I never needed them. Simple Python, embeddings, and efficient dot product operations worked well for me.
  • LLMs do not need to have complete world knowledge. Humans also don't possess comprehensive knowledge but can adapt. LLMs only need to know how to utilize the available knowledge. It might be possible to create smaller models by separating the knowledge component.
  • The next wave of innovation might involve simulating "thoughts" before answering, rather than simply predicting one word after another. This approach could lead to significant advancements.