Welcome to LLM-PowerHouse, your ultimate resource for unleashing the full potential of Large Language Models (LLMs) with custom training and inferencing. This GitHub repository is a comprehensive and curated guide designed to empower developers, researchers, and enthusiasts to harness the true capabilities of LLMs and build intelligent applications that push the boundaries of natural language understanding.
- Prompt Engineering
- Fine Tuning
- When to perform
- Difference between Masked Language Model and Causal Language Model
- Open Source LLM Space for Research Use
- Open Source LLM Space for Commercial Use
- LLM Training Frameworks
- Effective Deployment Strategies for Language Models
- Courses about LLM
- Tutorials about LLM
- Codebase Mastery: Building with Perfection with In-Depth Articles
- What I am learning
Prompt engineering is a technique that can be used to improve the performance of LLMs on specific tasks. It involves crafting prompts that help the LLM to generate the desired output. This can be done by providing the model with additional information, such as examples of the desired output, or using specific language the model is likely to understand.
Prompt Engineering can be powerful tool, but it is important to note that it is not a silver bullet. LLMs can still generate incorrect or unexpected output even with the best prompts. As a result, it is important to test the output of LLMs carefully before using them in production.
In other hand, fine-tuing is adapting a pre-trained LLM to a specific task or domain by training it further on a smaller, task-specific dataset. This is done by adjusting the model's weights and parameters to minimize the loss function and improve its performance on the task.
Fine-tuning can be more effective way to improve the performance of LLMs on specific tasks that prompt engineering. However, it is also more time-consuming and expensive. As a result, it is important to consider the cost and time involved in fine-tuning before deciding whether to use it.
Fine-tuning is typically needed when the task is new or challenging or when the desired output is highly specific. In these cases, prompt engineering may not be able to provide the model with enough information to generate the desired output.
Prompt engineering is typically sufficient when the task is well-defined and the desired output could be more specific. In these cases, prompt engineering can provide the model with the information it needs to generate the desired output.
Masked Language Model (MLM) | Causal Language Model (CLM) | |
---|---|---|
Training Objective | Predict original values of masked tokens within context | Generate next token based on preceding tokens |
Input Conditioning | Includes original context with randomly masked tokens | Includes preceding tokens in the sequence |
Bidirectionality | Bidirectional, considers entire context to predict masked tokens | Unidirectional, relies on preceding tokens |
Use Cases | Text completion, masked word prediction, text understanding | Text generation, story completion, language translation |
Example Model | BERT (Bidirectional Encoder Representations from Transformers) | GPT (Generative Pretrained Transformer) |
Associated Task | Natural Language Understanding (NLU) | Natural Language Generation (NLG) |
Example Sentence | "I want to __ a book." | "The cat sat on the __." |
Masked Token | [MASK] |
[MASK] |
Objective | Predict missing word | Predict next word in sequence |
Possible Predictions | Read, buy, borrow, ... | mat, chair, table, ... |
Context | "I want to [MASK] a book." | "The cat sat on the [MASK]." |
Output Prediction | Read | mat |
Language Model | Description | Link |
---|---|---|
Baize | Baize is an open-source chat model trained with LoRA. It uses 100k dialogs generated by letting ChatGPT chat with itself. | π |
Koala | A Dialogue Model for Academic Research | π |
Dalai | The simplest way to run LLaMA on your local machine | π |
LLaMA | A foundational, 65-billion-parameter large language model. LLaMA.cpp Lit-LLaMA | π |
ColossalChat | LLM trained with RLHF powered by Colossal-AI | π |
Vicuna | An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality. | π |
Dolly | A cheap-to-build LLM that exhibits a surprising degree of the instruction following capabilities exhibited by ChatGPT | π |
GPT4All | Demo, data, and code to train open-source assistant-style large language model based on GPT-J and LLaMa | π |
Alpaca | A model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. Alpaca.cpp Alpaca-LoRA | π |
Framework | Description | Resource |
---|---|---|
Alpa | Alpa is a system for training and serving large-scale neural networks. | π |
DeepSpeed | DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective. | π |
Megatron-DeepSpeed | DeepSpeed version of NVIDIA's Megatron-LM that adds additional support for several features such as MoE model training, Curriculum Learning, 3D Parallelism, and others. | π |
FairScale | FairScale is a PyTorch extension library for high performance and large scale training | π |
Megatron-LM | Ongoing research training transformer models at scale. | π |
Colossal-AI | Making large AI models cheaper, faster, and more accessible. | π |
BMTrain | Efficient Training for Big Models. | π |
Mesh Tensorflow | Mesh TensorFlow: Model Parallelism Made Easier. | π |
maxtext | A simple, performant and scalable Jax LLM! | π |
gpt-neox | An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library. | π |
Trainer API | Provides an API for feature-complete training in PyTorch for most standard use cases | π |
Lighting | Deep learning framework to train, deploy, and ship AI products Lightning fast | π |
Deployment Tools | Description | Resource |
---|---|---|
SkyPilot | Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU availability, and managed execution -- all with a simple interface. | π |
vLLM | A high-throughput and memory-efficient inference and serving engine for LLMs | π |
Text Generation Inference | A Rust, Python and gRPC server for text generation inference. Used in production at HuggingFace to power LLMs api-inference widgets. | π |
Haystack | an open-source NLP framework that allows you to use LLMs and transformer-based models from Hugging Face, OpenAI and Cohere to interact with your own data. | π |
Sidekick | Data integration platform for LLMs. | π |
LangChain | Building applications with LLMs through composability | π |
wechat-chatgpt | Use ChatGPT On Wechat via wechaty | π |
promptfoo | Test your prompts. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality. | π |
Agenta | Easily build, version, evaluate and deploy your LLM-powered apps | π |
Courses | Link |
---|---|
Full Stack DeepLearning's "LLM BootCamp" | π |
Cohere's LLM University - By Luis Serrano, Jay Alammar and Meor Amer | π |
"fast.ai's Part 2 of "Practical Deep Learning for Coders" | π |
Deep Learning Fundamentals by Lightning AI & Sebastian Raschka, PhD. | π |
Hugging Face's NLP Course | π |
DeepLearning.AI ChatGPT Prompt Engineering for Developers |
π |
Princeton Understanding Large Language Models |
π |
Stanford CS224N-Lecture 11: Prompting, Instruction Finetuning, and RLHF |
π |
Stanford CS324-Large Language Models |
π |
Stanford CS25-Transformers United V2 |
π |
Stanford GPT-3 & Beyond |
π |
MIT Introduction to Data-Centric AI |
π |
Cohere LLM University to learn about LLMs and NLP |
π |
Oreilly Deploying GPT and Large Language Models |
π |
edx Professional Certificate in Large Language Models |
π |
coursera Natural Language Processing Specialization |
π |
Class Central Large Language Models |
π |
Rycolab Large Language Models |
π |
Tutorials | Author | Link |
---|---|---|
State of GPT | Andrej Karpathy | π |
Instruction finetuning and RLHF lecture | Hyung Won Chung | π |
Scaling, emergence, and reasoning in large language models | Jason Wei | π |
Open Pretrained Transformers | Susan Zhang | π |
How Does ChatGPT Work? | Ameet Deshpande | π |
GPT in 60 Lines of NumPy | Jay Mody | π |
Welcome to the "Big Model" Era: Techniques and Systems to Train and Serve Bigger Models | ICML 2022 | π |
Foundational Robustness of Foundation Models | NeurIPS 2022 | π |
Let's build GPT: from scratch, in code, spelled out | Andrej Karpathy | π π¨βπ» |
Prompt Engineering Guide | DAIR.AI | π |
Fine-tune FLAN-T5 XL/XXL using DeepSpeed & Hugging Face Transformers | Philipp Schmid | π |
Illustrating Reinforcement Learning from Human Feedback (RLHF) | HuggingFace | π |
What Makes a Dialog Agent Useful? | HuggingFace | π |
How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources | Yao Fu | π |
What Is ChatGPT Doing β¦ and Why Does It Work? | Stephen Wolfram | π |
Why did all of the public reproduction of GPT-3 fail? | Jingfeng Yang | π |
Pure Rust implementation of a minimal Generative Pretrained Transformer | Keyvan Kambakhsh | π |
Title | Repository |
---|---|
Instruction based data prepare using OpenAI | π |
Optimal Fine-Tuning using the Trainer API: From Training to Model Inference | π |
Efficient Fine-tuning and inference LLMs with PEFT and LoRA | π |
Efficient Fine-tuning and inference LLMs Accelerate | π |
Curated list of of articles | π |
After immersing myself in the recent GenAI text-based language model hype for nearly a month, I have made several observations about its performance on my specific tasks.
Please note that these observations are subjective and specific to my own experiences, and your conclusions may differ.
- We need a minimum of 7B parameter models (<7B) for optimal natural language understanding performance. Models with fewer parameters result in a significant decrease in performance. However, using models with more than 7 billion parameters requires a GPU with greater than 24GB VRAM (>24GB).
- Benchmarks can be tricky as different LLMs perform better or worse depending on the task. It is crucial to find the model that works best for your specific use case. In my experience, MPT-7B is still the superior choice compared to Falcon-7B.
- Prompts change with each model iteration. Therefore, multiple reworks are necessary to adapt to these changes. While there are potential solutions, their effectiveness is still being evaluated.
- For fine-tuning, you need at least one GPU with greater than 24GB VRAM (>24GB). A GPU with 32GB or 40GB VRAM is recommended.
- Fine-tuning only the last few layers to speed up LLM training/finetuning may not yield satisfactory results. I have tried this approach, but it didn't work well.
- Loading 8-bit or 4-bit models can save VRAM. For a 7B model, instead of requiring 16GB, it takes approximately 10GB or less than 6GB, respectively. However, this reduction in VRAM usage comes at the cost of significantly decreased inference speed. It may also result in lower performance in text understanding tasks.
- Those who are exploring LLM applications for their companies should be aware of licensing considerations. Training a model with another model as a reference and requiring original weights is not advisable for commercial settings.
- There are three major types of LLMs: basic (like GPT-2/3), chat-enabled, and instruction-enabled. Most of the time, basic models are not usable as they are and require fine-tuning. Chat versions tend to be the best, but they are often not open-source.
- Not every problem needs to be solved with LLMs. Avoid forcing a solution around LLMs. Similar to the situation with deep reinforcement learning in the past, it is important to find the most appropriate approach.
- I have tried but didn't use langchains and vector-dbs. I never needed them. Simple Python, embeddings, and efficient dot product operations worked well for me.
- LLMs do not need to have complete world knowledge. Humans also don't possess comprehensive knowledge but can adapt. LLMs only need to know how to utilize the available knowledge. It might be possible to create smaller models by separating the knowledge component.
- The next wave of innovation might involve simulating "thoughts" before answering, rather than simply predicting one word after another. This approach could lead to significant advancements.