/awesome-generative-ai-guide

A one stop repository for generative AI research updates, interview resources, notebooks and much more!

MIT LicenseMIT

⭐ πŸ”– awesome-generative-ai-guide

Generative AI is experiencing rapid growth, and this repository serves as a comprehensive hub for updates on generative AI research, interview materials, notebooks, and more!

Explore the following resources:

  1. Monthly Best GenAI Papers List
  2. GenAI Interview Resources
  3. Applied LLMs Mastery 2024 (created by Aishwarya Naresh Reganti) course material
  4. List of all GenAI-related free courses (over 75 listed)
  5. List of code repositories/notebooks for developing generative AI applications

We'll be updating this repository regularly, so keep an eye out for the latest additions!

Happy Learning!


πŸ”ˆ Announcements


⭐ Best GenAI Papers List (April 2024)

*Updated at the end of every month

Date Title Summary Topics
April 30, 2024 Octopus v4: Graph of language models This paper introduces Octopus v4, a novel approach leveraging functional tokens to integrate multiple open-source language models optimized for specific tasks. Octopus v4 excels in directing user queries to the most appropriate model and reformulating queries for optimal performance, building upon previous iterations (v1, v2, and v3) with enhanced selection and parameter understanding. Additionally, it explores the use of graphs as a versatile data structure to coordinate multiple models effectively. Foundational LLM
April 30, 2024 Better & Faster Large Language Models via Multi-token Prediction This paper proposes training language models to predict multiple future tokens simultaneously, enhancing sample efficiency without increasing training time. By employing multiple output heads for predicting n tokens ahead, the method improves downstream capabilities for both code and natural language models. Particularly beneficial for larger models, it consistently outperforms single-token prediction on generative benchmarks like coding, showing notable gains in problem-solving tasks. Moreover, models trained with multi-token prediction demonstrate up to threefold faster inference speeds, even with large batch sizes, offering additional efficiency benefits. New Architecture
April 30, 2024 Extending Llama-3's Context Ten-Fold Overnight The Llama-3-8B-Instruct model's context length is extended from 8K to 80K through efficient QLoRA fine-tuning, requiring only 8 hours on a single 8xA800 GPU machine. This extension significantly enhances model performance across various evaluation tasks like NIHS and topic retrieval, while maintaining proficiency in short-context tasks. Surprisingly, the extension is achieved with just 3.5K synthetic training samples from GPT-4, showcasing the untapped potential of LLMs to extend context lengths. The team plans to release all associated resources publicly, including data, model, data generation pipeline, and training code. Context Length
April 29, 2024 Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models This paper addresses the challenge of accurately evaluating the quality of LLMs by proposing the use of a Panel of LLM Evaluators (PoLL) instead of relying on a single large model like GPT-4. The PoLL approach, composed of a larger number of smaller models, outperforms single large judges across three distinct settings and six datasets. It exhibits less intra-model bias and is over seven times less expensive, offering a cost-effective and more reliable evaluation method for LLMs. Evaluation
April 28, 2024 AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs This paper presents a novel method for generating human-readable adversarial prompts, called AdvPrompter, to address jailbreaking attacks on LLMs. Unlike existing optimization-based approaches, AdvPrompter achieves adversarial prompt generation in seconds, 800 times faster, without requiring access to gradients from the TargetLLM. The method alternates between generating high-quality target adversarial suffixes and low-rank fine-tuning of AdvPrompter. Experimental results demonstrate state-of-the-art performance on the AdvBench dataset and transferability to closed-source black-box LLM APIs. Adversarial Attacks, Evaluation
April 28, 2024 Capabilities of Gemini Models in Medicine Med-Gemini, a specialized multimodal model for medical tasks, surpasses GPT-4 on various benchmarks, achieving state-of-the-art results in medical text summarization and question answering. With its advanced long-context reasoning, it outperforms existing methods in tasks such as needle-in-a-haystack retrieval from medical records. While promising, further evaluation is needed before deployment in real-world medical applications. Domain-Specific LLMs
April 25, 2024 How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites InternVL 1.5, an open-source multimodal large language model (MLLM), bridges the gap between open-source and proprietary commercial models in multimodal understanding. It introduces three improvements: a Strong Vision Encoder, Dynamic High-Resolution image processing supporting up to 4K resolution, and a High-Quality Bilingual Dataset. Evaluation across benchmarks demonstrates its effectiveness compared to both open-source and proprietary models. Multimodal LLMs
April 25, 2024 Make Your LLM Fully Utilize the Context This paper introduces information-intensive (IN2) training to address the lost-in-the-middle challenge faced by contemporary LLMs. Leveraging a synthesized long-context question-answer dataset, IN2 training emphasizes fine-grained information awareness within long contexts. Applying this approach to Mistral-7B yields FILM-7B (FILl-in-the-Middle), which robustly retrieves information from various positions in a 32K context window. FILM-7B improves performance on real-world long-context tasks while maintaining comparable performance on short-context tasks Context Length
April 25, 2024 SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension This work introduces SEED-Bench-2-Plus, a benchmark specifically tailored for evaluating text-rich visual comprehension of Multimodal Large Language Models (MLLMs). With 2.3K multiple-choice questions covering Charts, Maps, and Webs, it aims to simulate real-world text-rich scenarios comprehensively. Evaluation involving 34 prominent MLLMs highlights current limitations in text-rich visual comprehension, emphasizing the need for further research and improvement in this area. SEED-Bench-2-Plus serves as a valuable addition to existing MLLM benchmarks, offering insightful observations and inspiring future developments in text-rich visual comprehension. Multimodal LLMs
April 23, 2024 AURORA -M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order This paper presents AURORA -M, a multilingual open-source language model trained on English, Finnish, Hindi, Japanese, Vietnamese, and code. It surpasses 2 trillion tokens in total training token count and is fine-tuned on human-reviewed safety instructions, aligning its development with the Biden-Harris Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. AURORA -M is rigorously evaluated across various tasks and languages, demonstrating robustness against catastrophic forgetting and outperforming alternatives in multilingual settings, particularly in safety evaluations. Domain-Specific LLMs
April 23, 2024 Multi-Head Mixture-of-Experts This paper introduces Multi-Head Mixture-of-Experts (MH-MoE) to address issues in Sparse Mixtures of Experts (SMoE), specifically low expert activation and lack of fine-grained analytical capabilities. MH-MoE employs a multi-head mechanism to split tokens into sub-tokens, assigning them to diverse experts for parallel processing before reintegrating them. This approach enhances expert activation, deepening context understanding and alleviating overfitting. MH-MoE is easy to implement and integrates seamlessly with other SMoE models, as demonstrated across English-focused language modeling, Multi-lingual language modeling, and Masked multi-modality modeling tasks. New Architecture
April 22, 2024 Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone This paper introduces phi-3-mini, a compact 3.8 billion parameter language model trained on 3.3 trillion tokens, delivering competitive performance similar to larger models like Mixtral 8x7B and GPT-3.5. Achieving notable scores on benchmarks such as MMLU (69%) and MT-bench (8.38), phi-3-mini is designed for deployment on mobile devices. The innovation lies in its dataset, a scaled-up version of phi-2's, comprising heavily filtered web data and synthetic data. Additionally, initial parameter-scaling results with phi-3-small and phi-3-medium models trained on 4.8T tokens demonstrate further enhanced performance. Foundational LLM
April 22, 2024 How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study This paper explores the performance of Meta's LLaMA3 LLMs under low-bit quantization, essential for resource-limited scenarios. Despite their impressive pre-training on over 15T tokens, LLaMA3 models exhibit notable degradation when quantized to low bit-width. Evaluating 10 quantization methods on 1-8 bits across diverse datasets, the study reveals significant performance gaps, especially in ultra-low bit-width scenarios, highlighting the need for future developments to bridge this gap for practical applications. Quantization
April 22, 2024 FlowMind: Automatic Workflow Generation with LLMs This paper introduces FlowMind, leveraging LLMs like Generative Pretrained Transformers (GPT) to automate workflow generation in Robotic Process Automation (RPA), overcoming limitations in handling spontaneous tasks. FlowMind's generic prompt recipe grounds LLM reasoning with reliable APIs, mitigating hallucination issues and ensuring data confidentiality. It simplifies user interaction by presenting high-level workflow descriptions, allowing effective inspection and feedback. Evaluation on NCEN-QA dataset demonstrates FlowMind's success and the significance of its components in enhancing user interaction and workflow generation. LLM Agents
April 22, 2024 SnapKV: LLM Knows What You are Looking for Before Generation This paper introduces SnapKV, a fine-tuning-free approach to efficiently minimize Key-Value (KV) cache size in LLMs while maintaining comparable performance. SnapKV utilizes attention head-specific prompt features identified from an 'observation' window, automatically compressing KV caches by selecting clustered important positions. This significantly reduces computational overhead and memory footprint, achieving a 3.6x increase in generation speed and an 8.2x enhancement in memory efficiency compared to baseline models when processing long input sequences Fine-Tuning
April 21, 2024 AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation This paper introduces AutoCrawler, a two-stage framework merging LLMs with crawlers to enhance adaptability in web automation. Addressing limitations of traditional methods and standalone LLM-based agents, AutoCrawler employs a hierarchical HTML structure for progressive understanding through top-down and step-back operations. Comprehensive experiments validate the effectiveness of this approach in handling diverse and changing web environments efficiently LLM Agents
April 21, 2024 Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models This paper presents Groma, a Multimodal Large Language Model (MLLM) equipped with fine-grained visual perception capabilities, enabling region-level tasks like captioning and visual grounding. Groma employs a localized visual tokenization mechanism to decompose images into regions of interest, seamlessly integrating region tokens into user instructions and model responses. By curating a visually grounded instruction dataset, Groma outperforms MLLMs relying solely on language models or external modules for localization, demonstrating superior performance in standard referring and grounding benchmarks. Multimodal LLMs
April 18, 2024 Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing This paper addresses the challenge of enhancing LLMs reasoning and planning capabilities without relying on extensive data or fine-tuning. Introducing AlphaLLM, it integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop. AlphaLLM includes components for prompt synthesis, an efficient MCTS approach for language tasks, and critic models for precise feedback. Experimental results on mathematical reasoning tasks demonstrate that AlphaLLM significantly enhances LLM performance without additional annotations, showcasing its potential for self-improvement in complex reasoning and planning tasks. Instruction Tuning
April 18, 2024 Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment This paper explores a simple approach for zero-shot cross-lingual alignment of language models using reward models trained on preference data from one source language and applied to other target languages. Evaluations on summarization and open-ended dialog generation tasks consistently show the success of this method, with cross-lingually aligned models preferred by humans in over 70% of evaluation instances. Surprisingly, different-language reward models sometimes outperform same-language ones. The study also identifies best practices for alignment when language-specific data for supervised fine-tuning is unavailable. Instruction Tuning
April 18, 2024 Introducing v0.5 of the AI Safety Benchmark from MLCommons This paper presents v0.5 of the AI Safety Benchmark, developed by the MLCommons AI Safety Working Group, to assess safety risks of chat-tuned language models. It introduces a principled approach, covering a single use case and personas, along with a taxonomy of 13 hazard categories and tests for 7 categories. Version 1.0 is planned for release by 2024, aiming to provide deeper insights into AI system safety. While v0.5 should not be used for safety assessment, it offers detailed documentation and tools for evaluation, including a grading system and an openly available platform called ModelBench. Benchmark, Evaluation
April 16, 2024 Octopus v2: On-device language model for super agent This research presents a new method that empowers an on-device language model with 2 billion parameters to surpass the performance of GPT-4 in both accuracy and latency, while reducing the context length by 95%. The method addresses concerns over privacy and cost associated with large-scale language models in cloud environments by enabling deployment on edge devices such as smartphones, cars, VR headsets, and personal computers. By enhancing latency and reducing inference costs, the method aligns with the performance requisites for real-world applications, making it suitable for deployment across a variety of edge devices in production environments. Small LLMs
April 15, 2024 Learn Your Reference Model for Real Good Alignment Existing methods for the alignment problem are unstable, prompting researchers to develop various techniques. In Language Model alignment, Reinforcement Learning From Human Feedback (RLHF) minimizes the Kullback-Leibler divergence between policies to prevent overfitting. Direct Preference Optimization (DPO) aims to eliminate the Reward Model but faces limitations. We propose Trust Region DPO (TR-DPO), updating the reference policy during training, which outperforms DPO by up to 19% on Anthropic HH and TLDR datasets, enhancing model quality across multiple parameters. Prompt Engineering
April 15, 2024 Compression Represents Intelligence Linearly This paper investigates the relationship between compression and intelligence in LLMs, finding that LLMs' ability to compress external text corpora correlates almost linearly with their intelligence, as measured by benchmark scores. The results provide empirical evidence supporting the belief that superior compression reflects greater intelligence. Additionally, compression efficiency serves as a reliable evaluation measure associated with model capabilities, with open-sourced datasets and pipelines provided for future research in compression assessment. Model Compression
April 14, 2024 Pre-training Small Base LMs with Fewer Tokens The paper presents Inheritune, a straightforward method for constructing a smaller language model from a larger one by inheriting transformer blocks and training it on a fraction of the original pretraining data. They showcase its effectiveness by building a 1.5B parameter LM using only 1B tokens from a larger model, achieving comparable performance to publicly available models trained on significantly more data. Furthermore, they demonstrate that smaller LMs utilizing layers from larger ones can match the performance of their bigger counterparts when trained on equivalent data volumes. Extensive experiments validate the efficacy of Inheritune across diverse settings. Small LLMs
April 14, 2024 Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies The paper explores scaling down Contrastive Language-Image Pre-training (CLIP) under limited computation budgets across data, architecture, and training strategies. It emphasizes the importance of high-quality data and suggests smaller ViT models for smaller datasets and larger ones for larger datasets with fixed compute. Additionally, it compares four training strategies, finding that CLIP+Data Augmentation achieves comparable results to CLIP using half the data, offering practical insights for CLIP training and deployment. Vision Models
April 12, 2024 Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Megalodon addresses the quadratic complexity and weak length extrapolation issues of Transformers by introducing a neural architecture for efficient sequence modeling with unlimited context length. It inherits Mega's architecture and incorporates enhancements such as complex exponential moving average (CEMA), timestep normalization layer, normalized attention mechanism, and pre-norm with two-hop residual configuration. In a head-to-head comparison with Llama2, Megalodon demonstrates better efficiency than Transformers with 7 billion parameters and 2 trillion training tokens, achieving a training loss of 1.70, positioning between Llama2-7B (1.75) and 13B (1.67) Context Length
April 11, 2024 RULER: What’s the Real Context Size of Your Long-Context Language Models? The needle-in-a-haystack (NIAH) test, widely used to evaluate long-context language models, assesses the ability to retrieve information from long distractor texts. However, it only measures a superficial form of long-context understanding. To provide a more comprehensive evaluation, a new synthetic benchmark called RULER is introduced. RULER expands upon the NIAH test by incorporating variations with diverse types and quantities of needles and introduces new task categories like multi-hop tracing and aggregation to test behaviors beyond context searching. The evaluation of ten long-context LMs with 13 representative tasks in RULER reveals large performance drops as the context length increases, despite nearly perfect accuracy in the NIAH test. Only four models can maintain satisfactory performance at the length of 32K tokens. RULER is open-sourced to encourage comprehensive evaluation of long-context LMs. Context Length
April 11, 2024 Social Skill Training with Large Language Models This perspective paper identifies social skill barriers to enter specialized fields and presents a solution leveraging large language models for social skill training via a generic framework. The proposed AI Partner, AI Mentor framework merges experiential learning with realistic practice and tailored feedback. The work calls for cross-disciplinary innovation to address the broader implications for workforce development and social equality.;Social Skill Training; LLMs Alignment
April 11, 2024 Rho-1: Not All Tokens Are What You Need Traditional language model pre-training methods treat all tokens equally, but our research challenges this by showing that not all tokens are equally important. We introduce Rho-1, a new model that selectively trains on tokens aligned with the desired distribution, improving few-shot accuracy in math tasks by up to 30%. After fine-tuning, Rho-1 achieves state-of-the-art results on the MATH dataset with significantly fewer pretraining tokens compared to existing models. Moreover, pretraining Rho-1 on general tokens enhances performance across diverse tasks, boosting both efficiency and effectiveness in language model pre-training. New Architecture
April 11, 2024 RecurrentGemma: Moving Past Transformers for Efficient Open Language Models The paper introduces RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned variant. Both models achieve comparable performance to Gemma-2B despite being trained on fewer tokens. New Architecture
April 11, 2024 Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models This work presents Ferret-v2, an upgraded version of Ferret that overcomes limitations in regional understanding within LLMs. Ferret-v2 introduces three key enhancements: (1) Any resolution grounding and referring for improved image processing at higher resolutions. (2) Multi-granularity visual encoding using the DINOv2 encoder to better capture diverse visual contexts. (3) A three-stage training paradigm, including high-resolution dense alignment, leading to substantial improvements over Ferret and other state-of-the-art methods in referring and grounding tasks. Benchmark, Evaluation
April 10, 2024 JetMoE: Reaching Llama2 Performance with 0.1M Dollars The paper introduces JetMoE-8B, a cost-effective and high-performing Large Language Model trained with minimal resources. Its efficient architecture reduces computation significantly compared to previous models, while its transparency encourages collaboration and advancements in accessible LLM development. Foundational LLM
April 9, 2024 Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models This paper examines how LLMs handle tabular data, focusing on issues of memorization and overfitting. It finds that LLMs memorize popular tabular datasets and perform better on these, suggesting overfitting. The study also highlights the limited in-context statistical learning abilities of LLMs without fine-tuning, emphasizing the importance of evaluating whether an LLM has seen an evaluation dataset during pre-training. The paper introduces the tabmemcheck Python package for testing exposure to datasets. Domain-Specific LLMs
April 9, 2024 Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention This work introduces an efficient method to scale Transformer-based LLMs to infinitely long inputs with bounded memory and computation. A key component in the proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds both masked local attention and long-term linear attention mechanisms in a single Transformer block. The effectiveness of this approach is demonstrated on long-context language modeling benchmarks, 1M sequence length passkey context block retrieval, and 500K length book summarization tasks with 1B and 8B LLMs. The approach introduces minimal bounded memory parameters and enables fast streaming inference for LLMs. Context Length
April 8, 2024 LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders Large decoder-only language models (LLMs) excel in NLP tasks but are underutilized for text embedding. This study introduces LLM2Vec, a method converting decoder-only LLMs into robust text encoders via bidirectional attention, masked token prediction, and contrastive learning. Applied to LLMs with 1.3B to 7B parameters, LLM2Vec surpasses encoder-only models on word-level tasks and achieves a new unsupervised state-of-the-art on the Massive Text Embeddings Benchmark (MTEB). Integration with supervised contrastive learning further boosts performance, demonstrating the potential to create universal text encoders from LLMs without costly adaptation or synthetic data. New Architecture
April 7, 2024 Stream of Search (SoS): Learning to Search in Language This paper introduces the concept of Stream of Search (SoS), teaching language models to search by representing the process in language. SoS is demonstrated using the game of Countdown, where models are trained to combine input numbers and arithmetic operations to reach a target number. Pretraining on SoS increases search accuracy by 25%, and further fine-tuning allows models to solve 36% of previously unsolved problems. This approach enables language models to learn problem-solving strategies and potentially discover new ones. Domain-Specific LLMs
April 4, 2024 Long-context LLMs Struggle with Long In-context Learning This study introduces a specialized benchmark, LongICLBench, focusing on long in-context learning within the realm of extreme-label classification. The benchmark evaluates 13 long-context LLMs on datasets with input lengths ranging from 2K to 50K tokens and label ranges spanning 28 to 174 classes. While long-context LLMs perform relatively well on less challenging tasks with shorter demonstration lengths, they struggle on more difficult tasks, reaching close to zero accuracy on the most challenging task, Discovery with 174 labels. Further analysis reveals a gap in current LLM capabilities for processing and understanding long, context-rich sequences, indicating the need for improved long context understanding and reasoning abilities in future LLMs. Context Length
April 4, 2024 ReFT: Representation Finetuning for Language Models This paper introduces Representation Finetuning (ReFT) methods as an alternative to parameter-efficient fine-tuning (PEFT) methods for adapting large language models. ReFT methods operate on a frozen base model and learn task-specific interventions on hidden representations, aiming to edit representations rather than modifying weights. A strong instance of ReFT, called Low-rank Linear Subspace ReFT (LoReFT), is presented, which achieves 10-50 times more parameter efficiency than prior PEFTs. LoReFT is showcased on various evaluation tasks, delivering the best balance of efficiency and performance compared to existing methods. Fine-Tuning, PEFT
April 4, 2024 Training LLMs over Neurally Compressed Text This paper explores training LLMs over highly compressed text using neural text compressors. While standard subword tokenizers compress text by a small factor, neural text compressors can achieve much higher rates of compression. The main obstacle to training LLMs directly over neurally compressed text is that strong compression tends to produce opaque outputs not well-suited for learning. To address this, the paper proposes Equal-Info Windows, a compression technique segmenting text into blocks that compress to the same bit length. This method enables effective learning over neurally compressed text, improving with scale and outperforming byte-level baselines on perplexity and inference speed benchmarks. The paper also provides suggestions for further improving high-compression tokenizers. Model Compression
April 4, 2024 CODE EDITOR BENCH: EVALUATING CODE EDITING CAPABILITY OF LARGE LANGUAGE MODELS This paper introduces CodeEditorBench, an evaluation framework designed to rigorously assess the performance of LLMs in code editing tasks, including debugging, translating, polishing, and requirement switching. CodeEditorBench emphasizes real-world scenarios and practical aspects of software development by curating diverse coding challenges and scenarios from various sources. Evaluation of 19 LLMs reveals that closed-source models, particularly Gemini-Ultra and GPT-4, outperform open-source models in CodeEditorBench, highlighting differences in model performance based on problem types and prompt sensitivities. CodeEditorBench aims to catalyze advancements in LLMs by providing a robust platform for assessing code editing capabilities and will release all prompts and datasets to enable the community to expand the dataset and benchmark emerging LLMs. Evaluation
April 4, 2024 GPT-4V Red-teamed under 11 Different Safety Policies This paper presents a comprehensive jailbreak evaluation dataset comprising 1445 harmful questions across 11 safety policies. Extensive red-teaming experiments are conducted on 11 different LLMs and Multimodal Large Language Models (MLLMs), including both state-of-the-art proprietary and open-source models. Results reveal GPT4 and GPT-4V's superior robustness against jailbreak attacks compared to open-source models. Notably, Llama2 and Qwen-VL-Chat demonstrate higher robustness among open-source models. The transferability of visual jailbreak methods is found to be relatively limited compared to textual jailbreak methods. Red Teaming
April 4, 2024 RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis RALL-E presents a robust language modeling approach for text-to-speech (TTS) synthesis, addressing issues of poor robustness in LLMs such as unstable prosody and high word error rate (WER). The method employs chain-of-thought (CoT) prompting to decompose the task into simpler steps, predicting prosody features of the input text and using them as intermediate conditions to predict speech tokens. Additionally, RALL-E utilizes predicted duration prompts to guide self-attention weights, improving focus on corresponding phonemes and prosody features. Objective and subjective evaluations demonstrate significant improvements in WER compared to baseline methods, showcasing RALL-E's effectiveness in synthesizing challenging sentences with reduced error rates. Prompt Engineering
April 4, 2024 CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues The paper introduces the CANT TALKABOUT THIS dataset, aimed at aligning language models to maintain topic relevance in conversations. It consists of synthetic dialogues with distractor turns to divert chatbots from the predefined topic. Training on this dataset improves language models' ability to stay on topic and enhances performance on instruction-following tasks, including safety alignment Alignment
April 3, 2024 On the Scalability of Diffusion-based Text-to-Image Generation This paper empirically studies the scaling properties of diffusion-based text-to-image (T2I) models by conducting extensive ablations on scaling denoising backbones and training sets. The study explores various training settings and training costs to understand how to efficiently scale the model for better performance at reduced cost. The findings suggest that increasing the transformer blocks is more parameter-efficient for improving text-image alignment than increasing channel numbers. Additionally, the quality and diversity of the training set have a significant impact on text-image alignment performance and learning efficiency. Scaling functions are provided to predict text-image alignment performance based on model size, compute, and dataset size. Multimodal LLMs
April 2, 2024 Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward This paper introduces a novel framework for aligning large multimodal models (LMMs) with video content using detailed video captions as a proxy. The framework enhances the performance of video LMMs on video Question Answering (QA) tasks by incorporating informative feedback and improving the accuracy of generated responses compared to corresponding videos. The approach utilizes direct preference optimization (DPO) to guide LMMs towards generating more accurate, helpful, and harmless content in multimodal contexts. Multimodal LLMs
April 2, 2024 Advancing LLM Reasoning Generalists with Preference Trees This paper introduces EURUS, a suite of LLMs optimized for reasoning. Finetuned from Mistral-7B and CodeLlama-70B, EURUS models achieve state-of-the-art results among open-source models on a diverse set of benchmarks covering mathematics, code generation, and logical reasoning problems. EURUS outperforms existing open-source models by margins more than 13.3% on challenging benchmarks like LeetCode and TheoremQA. The strong performance of EURUS is attributed to ULTRA INTERACT, a large-scale alignment dataset designed for complex reasoning tasks, and a novel reward modeling objective derived from preference learning techniques. Domain-Specific LLMs
April 2, 2024 Mixture-of-Depths: Dynamically allocating compute in transformer-based language models This paper introduces a method for transformers to dynamically allocate compute, optimizing allocation across layers in the model depth. By capping the number of tokens participating in computations at each layer, the method uses a static computation graph with fluid token identities, resulting in efficient compute allocation. Models trained with this method match baseline performance but require fewer FLOPs per forward pass, speeding up training and sampling. New Architecture
April 1, 2024 LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model This paper presents LLaVA-Gemma, a suite of multimodal foundation models trained using the LLaVA framework with the Gemma family of LLMs, particularly the 2B parameter Gemma model. The study evaluates the effect of ablating three design features: pretraining the connector, utilizing a more powerful image backbone, and increasing the size of the language backbone. While LLaVA-Gemma exhibits moderate performance on various evaluations, it fails to surpass current state-of-the-art models of comparable size. The paper releases training recipes, code, and weights for the LLaVA-Gemma models, facilitating further research in this area. Multimodal LLMs

πŸŽ“ Courses

[Ongoing] Applied LLMs Mastery 2024

Join 1000+ students on this 10-week adventure as we delve into the application of LLMs across a variety of use cases

Link to the course website

[Feb 2024] Registrations are still open click here to register

πŸ—“οΈ*Week 1 [Jan 15 2024]*: Practical Introduction to LLMs

  • Applied LLM Foundations
  • Real World LLM Use Cases
  • Domain and Task Adaptation Methods

πŸ—“οΈ*Week 2 [Jan 22 2024]*: Prompting and Prompt Engineering

  • Basic Prompting Principles
  • Types of Prompting
  • Applications, Risks and Advanced Prompting

πŸ—“οΈ*Week 3 [Jan 29 2024]*: LLM Fine-tuning

  • Basics of Fine-Tuning
  • Types of Fine-Tuning
  • Fine-Tuning Challenges

πŸ—“οΈ*Week 4 [Feb 5 2024]*: RAG (Retrieval-Augmented Generation)

  • Understanding the concept of RAG in LLMs
  • Key components of RAG
  • Advanced RAG Methods

πŸ—“οΈ*Week 5 [ Feb 12 2024]*: Tools for building LLM Apps

  • Fine-tuning Tools
  • RAG Tools
  • Tools for observability, prompting, serving, vector search etc.

πŸ—“οΈ*Week 6 [Feb 19 2024]*: Evaluation Techniques

  • Types of Evaluation
  • Common Evaluation Benchmarks
  • Common Metrics

πŸ—“οΈ*Week 7 [Feb 26 2024]*: Building Your Own LLM Application

  • Components of LLM application
  • Build your own LLM App end to end

πŸ—“οΈ*Week 8 [March 4 2024]*: Advanced Features and Deployment

  • LLM lifecycle and LLMOps
  • LLM Monitoring and Observability
  • Deployment strategies

πŸ—“οΈ*Week 9 [March 11 2024]*: Challenges with LLMs

  • Scaling Challenges
  • Behavioral Challenges
  • Future directions

πŸ—“οΈ*Week 10 [March 18 2024]*: Emerging Research Trends

  • Smaller and more performant models
  • Multimodal models
  • LLM Alignment

πŸ—“οΈ*Week 11 *Bonus* [March 25 2024]*: Foundations

  • Generative Models Foundations
  • Self-Attention and Transformers
  • Neural Networks for Language

πŸ“– List of Free GenAI Courses

LLM Basics and Foundations
  1. Large Language Models by ETH Zurich

  2. Understanding Large Language Models by Princeton

  3. Transformers course by Huggingface

  4. NLP course by Huggingface

  5. CS324 - Large Language Models by Stanford

  6. Generative AI with Large Language Models by Coursera

  7. Introduction to Generative AI by Coursera

  8. Generative AI Fundamentals by Google Cloud

  9. Introduction to Large Language Models by Google Cloud

  10. Introduction to Generative AI by Google Cloud

  11. Generative AI Concepts by DataCamp (Daniel Tedesco Data Lead @ Google)

  12. 1 Hour Introduction to LLM (Large Language Models) by WeCloudData

  13. LLM Foundation Models from the Ground Up | Primer by Databricks

  14. Generative AI Explained by Nvidia

  15. Transformer Models and BERT Model by Google Cloud

  16. Generative AI Learning Plan for Decision Makers by AWS

  17. Introduction to Responsible AI by Google Cloud

  18. Fundamentals of Generative AI by Microsoft Azure

  19. Generative AI for Beginners by Microsoft

  20. ChatGPT for Beginners: The Ultimate Use Cases for Everyone by Udemy

  21. [1hr Talk] Intro to Large Language Models by Andrej Karpathy

  22. ChatGPT for Everyone by Learn Prompting

  23. Large Language Models (LLMs) (In English) by Kshitiz Verma (JK Lakshmipat University, Jaipur, India)

Building LLM Applications
  1. LLMOps: Building Real-World Applications With Large Language Models by Udacity

  2. Full Stack LLM Bootcamp by FSDL

  3. Generative AI for beginners by Microsoft

  4. Large Language Models: Application through Production by Databricks

  5. Generative AI Foundations by AWS

  6. Introduction to Generative AI Community Course by ineuron

  7. LLM University by Cohere

  8. LLM Learning Lab by Lightning AI

  9. Functions, Tools and Agents with LangChain by Deeplearning.AI

  10. LangChain for LLM Application Development by Deeplearning.AI

  11. LLMOps by DeepLearning.AI

  12. Automated Testing for LLMOps by DeepLearning.AI

  13. Building RAG Agents with LLMs by Nvidia

  14. Building Generative AI Applications Using Amazon Bedrock by AWS

  15. Efficiently Serving LLMs by DeepLearning.AI

  16. Building Systems with the ChatGPT API by DeepLearning.AI

  17. Serverless LLM apps with Amazon Bedrock by DeepLearning.AI

  18. Building Applications with Vector Databases by DeepLearning.AI

  19. Automated Testing for LLMOps by DeepLearning.AI

  20. LLMOps by DeepLearning.AI

  21. Build LLM Apps with LangChain.js by DeepLearning.AI

  22. Advanced Retrieval for AI with Chroma by DeepLearning.AI

  23. Operationalizing LLMs on Azure by Coursera

  24. Generative AI Full Course – Gemini Pro, OpenAI, Llama, Langchain, Pinecone, Vector Databases & More by freeCodeCamp.org

  25. Training & Fine-Tuning LLMs for Production by Activeloop

Prompt Engineering, RAG and Fine-Tuning
  1. LangChain & Vector Databases in Production by Activeloop

  2. Reinforcement Learning from Human Feedback by DeepLearning.AI

  3. Building Applications with Vector Databases by DeepLearning.AI

  4. Finetuning Large Language Models by Deeplearning.AI

  5. LangChain: Chat with Your Data by Deeplearning.AI

  6. Building Systems with the ChatGPT API by Deeplearning.AI

  7. Prompt Engineering with Llama 2 by Deeplearning.AI

  8. Building Applications with Vector Databases by Deeplearning.AI

  9. ChatGPT Prompt Engineering for Developers by Deeplearning.AI

  10. Advanced RAG Orchestration series by LlamaIndex

  11. Prompt Engineering Specialization by Coursera

  12. Augment your LLM Using Retrieval Augmented Generation by Nvidia

  13. Knowledge Graphs for RAG by Deeplearning.AI

  14. Open Source Models with Hugging Face by Deeplearning.AI

  15. Vector Databases: from Embeddings to Applications by Deeplearning.AI

  16. Understanding and Applying Text Embeddings by Deeplearning.AI

  17. JavaScript RAG Web Apps with LlamaIndex by Deeplearning.AI

  18. Quantization Fundamentals with Hugging Face by Deeplearning.AI

  19. Preprocessing Unstructured Data for LLM Applications by Deeplearning.AI

  20. Retrieval Augmented Generation for Production with LangChain & LlamaIndex by Activeloop

  21. Quantization in Depth by Deeplearning.AI

Evaluation
  1. Building and Evaluating Advanced RAG Applications by DeepLearning.AI
  2. Evaluating and Debugging Generative AI Models Using Weights and Biases by Deeplearning.AI
  3. Quality and Safety for LLM Applications by Deeplearning.AI
  4. Red Teaming LLM Applications by Deeplearning.AI
Multimodal
  1. How Diffusion Models Work by DeepLearning.AI
  2. How to Use Midjourney, AI Art and ChatGPT to Create an Amazing Website by Brad Hussey
  3. Build AI Apps with ChatGPT, DALL-E and GPT-4 by Scrimba
  4. 11-777: Multimodal Machine Learning by Carnegie Mellon University

Miscellaneous

  1. Avoiding AI Harm by Coursera
  2. Developing AI Policy by Coursera

πŸ“Ž Resources


πŸ’» Interview Prep

Topic wise Questions:

  1. Common GenAI Interview Questions
  2. Prompting and Prompt Engineering
  3. Model Fine-Tuning
  4. Model Evaluation
  5. MLOps for GenAI
  6. Generative Models Foundations
  7. Latest Research Trends

GenAI System Design (Coming Soon):

  1. Designing an LLM-Powered Search Engine
  2. Building a Customer Support Chatbot
  3. Building a system for natural language interaction with your data.
  4. Building an AI Co-pilot
  5. Designing a Custom Chatbot for Q/A on Multimodal Data (Text, Images, Tables, CSV Files)
  6. Building an Automated Product Description and Image Generation System for E-commerce

πŸ““ Code Notebooks

RAG Tutorials

Fine-Tuning Tutorials

Comprehensive LLM Code Repositories

  • LLM-PlayLab This playlab encompasses a multitude of projects crafted through the utilization of Transformer Models

βœ’οΈ Contributing

If you want to add to the repository or find any issues, please feel free to raise a PR and ensure correct placement within the relevant section or category.


πŸ“Œ Cite Us

To cite this guide, use the below format:

@article{areganti_generative_ai_guide,
author = {Reganti, Aishwarya Naresh},
journal = {https://github.com/aishwaryanr/awesome-generative-ai-resources},
month = {01},
title = {{Generative AI Guide}},
year = {2024}
}

License

[MIT License]