⭐ 🔖 awesome-generative-ai-guide

Generative AI is experiencing rapid growth, and this repository serves as a comprehensive hub for updates on generative AI research, interview materials, notebooks, and more!

Explore the following resources:

We'll be updating this repository regularly, so keep an eye out for the latest additions!

Happy Learning!

🔈 Announcements

Applied LLMs Mastery full course content has been released!!! (Click Here)
5-day roadmap to learn LLM foundations out now! (Click Here)
60 Common GenAI Interview Questions out now! (Click Here)
ICLR 2024 paper summaries (Click Here)
List of free GenAI courses (Click Here)

⭐ Best GenAI Papers List (February 2024)

*Updated at the end of every month

Date	Name	Summary	Topics
28 Feb 2024	Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models	The paper provides a thorough analysis of Sora, a text-to-video generative AI model launched by OpenAI. It examines Sora's evolution, underlying technologies, diverse applications across industries, and potential impact on creativity and productivity. Challenges like safety and bias in video generation are discussed, along with future directions for Sora and similar models, envisioning enhanced human-AI collaboration and innovation in video production. Note that this paper is not written by the creators of Sora, it is reverse engineered by a group of researchers.	Multimodal Models
28 Feb 2024	OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement	The paper introduces OpenCodeInterpreter, a family of open-source code systems aimed at addressing the limitations of existing open-source models in code generation by incorporating execution capabilities and iterative refinement similar to advanced systems like the GPT-4 Code Interpreter. Leveraging the CodeFeedback dataset, which includes 68K multi-turn interactions, OpenCodeInterpreter integrates execution and human feedback for dynamic code refinement. Evaluation across key benchmarks demonstrates exceptional performance, with OpenCodeInterpreter33B achieving close accuracy to GPT-4 on HumanEval and MBPP benchmarks, effectively bridging the gap between open-source and proprietary code generation systems.	Task Specific LLMs, Evaluation
27 Feb 2024	Evaluating Very Long-Term Conversational Memory of LLM Agents	This paper introduces a machine-human pipeline to generate high-quality, very long-term dialogues, spanning up to 35 sessions, using large language models and retrieval augmented generation techniques. The conversations are grounded on personas and temporal event graphs, with each agent capable of sharing and reacting to images. The resulting dataset, LOCOMO, comprises conversations with 300 turns on average. Evaluation benchmarks measure long-term memory in models, revealing challenges for LLMs in understanding lengthy conversations and comprehending long-range temporal dynamics. While strategies like long-context LLMs or RAG show improvements, models still lag behind human performance.	RAG, Benchmark, Long Context
27 Feb 2024	When Scaling Meets LLM Finetuning: The Effect of Data, Model, and Finetuning Method	This paper investigates the scaling properties of different finetuning methods for LLMs. Through systematic experiments, it explores the impact of various scaling factors, including model size, pretraining data size, and finetuning data size, on finetuning performance. Results suggest a power-based multiplicative joint scaling law between finetuning data size and other factors, with LLM model scaling offering more benefits than pretraining data scaling. Additionally, the optimal finetuning method varies depending on the task and finetuning data. These findings aim to enhance understanding and development of LLM finetuning methods.	Fine-Tuning
27 Feb 2024	The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits	This paper introduces BitNet b1.58, a variant where every parameter is ternary {-1, 0, 1}, matching full-precision Transformer LLMs in both perplexity and end-task performance while offering significant cost-effectiveness. This 1.58-bit LLM sets a new standard for high-performance, cost-effective models and opens opportunities for new computation paradigms and hardware designs optimized for 1-bit LLMs.	Cost Effective LLMs
26 Feb 2024	Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts	The paper introduces Rainbow Teaming, a method for diversely generating adversarial prompts to enhance the robustness of LLMs. By framing prompt generation as a quality-diversity problem, it uncovers vulnerabilities across various domains, including safety, question answering, and cybersecurity. Additionally, fine-tuning LLMs on synthetic data produced by Rainbow Teaming improves safety without compromising general capabilities, offering a path to open-ended self-improvement.	Red-Teaming
26 Feb 2024	Do Large Language Models Latently Perform Multi-Hop Reasoning?	The study investigates whether LLMs engage in latent multi-hop reasoning when processing complex prompts. By analyzing individual hops and their co-occurrence, the research examines how LLMs identify and utilize bridge entities to complete prompts. Results show strong evidence of latent multi-hop reasoning in certain relation types, with the reasoning pathway used in over 80% of prompts. However, the utilization varies contextually, and while evidence for the first hop is substantial, it's more moderate for the second hop. Additionally, there's a scaling trend with increasing model size for the first hop but not the second, indicating potential challenges and opportunities for future LLM development.	Evaluation
26 Feb 2024	Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs	The paper discusses Reinforcement Learning from Human Feedback (RLHF) as vital for large language model LLM alignment. While Proximal Policy Optimization (PPO) is commonly used, its high computational cost and hyperparameter sensitivity pose challenges. The study proposes simpler REINFORCE-style optimization variants for RLHF, showing superior performance compared to PPO and other methods like DPO and RAFT. It suggests that adapting to LLM alignment characteristics allows for efficient online RL optimization.	Instruction Tuning
23 Feb 2024	A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts	The paper introduces ReadAgent, an innovative LLM system that significantly extends effective context length, up to 20 times in experiments. Mimicking human reading, ReadAgent strategically stores and compresses content into "gist memories," enabling efficient retrieval when needed. Evaluation on long-document reading tasks demonstrates ReadAgent's superiority over baselines, enhancing performance while expanding the effective context window by 3 to 20 times.	LLM Agents
22 Feb 2024	MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases	The paper addresses the need for efficient LLMs on mobile devices by focusing on models with fewer than a billion parameters. Contrary to the belief that data and parameter quantity determine model quality, the study emphasizes the significance of model architecture. Introducing MobileLLM, leveraging deep and thin architectures, the model demonstrates notable accuracy boosts over previous state-of-the-art models. Additionally, MobileLLM-LS, incorporating block-wise weight sharing, further enhances accuracy with marginal latency overhead, highlighting the potential of small models for on-device use cases.	Smaller Models
22 Feb 2024	Stable Diffusion 3	Stability.ai announced the early preview of Stable Diffusion 3, their latest text-to-image model, boasting significant improvements in multi-subject prompts, image quality, and spelling abilities. The waitlist for early access is now open, allowing users to contribute insights for enhancing performance and safety prior to its public release. Ranging from 800M to 8B parameters, the suite offers scalability options to cater to various creative needs.Emphasizing safe and responsible AI practices, Stability.ai has implemented numerous safeguards and continues to collaborate with researchers and experts to ensure integrity throughout development and deployment.	Multimodal Models
21 Feb 2024	Coercing Large Language Models (LLMs) to Do and Reveal (Almost) Anything	The paper expands the scope of adversarial attacks on LLMs beyond "jailbreaking," highlighting various attack surfaces and goals. Through concrete examples, it categorizes attacks inducing unintended behaviors like misdirection, model control, denial-of-service, and data extraction. Controlled experiments reveal many attacks originate from pre-training with coding capabilities and the presence of "glitch" tokens in LLM vocabularies, emphasizing the need for security measures.	Red-Teaming
21 Feb 2024	LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens	The paper introduces LongRoPE, which extends the context window of pre-trained LLMs to an impressive 2048k tokens, overcoming limitations of current extended context windows. Key innovations include exploiting non-uniformities in positional interpolation, a progressive extension strategy, and readjusting to recover short context window performance. Extensive experiments demonstrate the effectiveness of LongRoPE across various tasks, with models retaining the original architecture and minor modifications to positional embedding.	Long Context, Embedding
21 Feb 2024	Gemma: Open Models Based on Gemini Research and Technology	This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations	Foundation LLMs
21 Feb 2024	Large Language Models for Data Annotation: A Survey	The paper focuses on leveraging advanced LLMs, like GPT-4, for automating data annotation, a labor-intensive process in machine learning. It offers insights into LLM-Based Data Annotation, Assessing LLM-generated Annotations, and Learning with LLM-generated annotations. The survey includes a taxonomy of methodologies, reviews learning strategies, and discusses challenges and limitations. Aimed at guiding researchers and practitioners, it aims to foster advancements in data annotation using the latest LLMs.	Task Specific LLMs
21 Feb 2024	In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss	The paper introduces BABILong, a benchmark designed to evaluate the processing capabilities of generative transformer models on long documents. While common methods are effective only for sequences up to 10^4 elements, fine-tuning GPT-2 with recurrent memory augmentations enables it to handle tasks involving up to 11 × 10^6 elements. This achievement represents a substantial leap, demonstrating significant improvement in processing capabilities for long sequences and marking the longest input processed by any neural network model to date.	Benchmark, Long Context
20 Feb 2024	Large Language Models: A Survey	The paper provides a comprehensive review of LLMs since the release of ChatGPT in November 2022. It discusses prominent LLM families (GPT, LLaMA, PaLM), their characteristics, contributions, and limitations, along with techniques for building and augmenting LLMs. Additionally, it surveys datasets, evaluation metrics, and performance comparisons of popular LLMs on representative benchmarks. The paper concludes by highlighting open challenges and future research directions in the field of LLMs.	Survey of LLMs
19 Feb 2024	LongAgent: Scaling Language Models to 128k Context Through Multi-Agent Collaboration	The paper introduces LongAgent, a method employing multi-agent collaboration to scale LLMs like LLaMA to process long texts up to 128K tokens. LongAgent utilizes a leader to interpret user intent and coordinate team members in acquiring information. To address hallucination-induced response inaccuracies, an inter-member communication mechanism resolves conflicts through information sharing. Experimental results demonstrate LongAgent's superiority over GPT-4 in tasks such as 128k-long text retrieval and multi-hop question answering.	LLM Agents
19 Feb 2024	LoRA+: Efficient Low Rank Adaptation of Large Models	The paper identifies suboptimal fine-tuning in models with large width (embedding dimension) using Low Rank Adaptation (LoRA) due to updating adapter matrices A and B with the same learning rate. By setting different learning rates for A and B with a fixed ratio in a proposed algorithm called LoRA+, the suboptimality of LoRA can be corrected. Extensive experiments demonstrate that LoRA+ improves performance (1% − 2% improvements) and fine-tuning speed (up to ∼ 2X SpeedUp) at the same computational cost as LoRA.	PEFT
15 Feb 2024	Generative Representational Instruction Tuning	The paper introduces generative representational instruction tuning (GRIT), enabling a large language model to excel in both generative and embedding tasks by distinguishing between them through instructions. GRITLM 7B sets a new state-of-the-art on the Massive Text Embedding Benchmark (MTEB) and outperforms all models of its size on generative tasks. Scaling up to GRITLM 8X7B further surpasses all open generative language models while remaining among the best embedding models. GRIT unifies generative and embedding training without performance loss, significantly speeding up RAG by over 60% for long documents. Models and code are available.	RAG, Instruction Tuning
15 Feb 2024	Chain-of-Thought Reasoning Without Prompting	The study enhances large language models' reasoning abilities without explicit prompting by altering the decoding process to uncover inherent chain-of-thought (CoT) reasoning paths. This method bypasses manual prompt engineering, assesses intrinsic reasoning abilities, and correlates CoT presence with higher confidence in decoded answers. Extensive empirical studies across benchmarks demonstrate significant performance improvement over standard greedy decoding.	Prompt Engineering
15 Feb 2024	Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context	The report introduces Gemini 1.5 Pro, a highly efficient multimodal model excelling in recalling and reasoning over vast amounts of context, including long documents and videos. It achieves near-perfect recall across tasks, surpasses previous state-of-the-art models, and exhibits surprising translation abilities for rare languages like Kalamang.	Foundation LLMs
15 Feb 2024	Revisiting Feature Prediction for Learning Visual Representations from Video	The paper introduces V-JEPA, a collection of vision models trained solely on video data using a feature prediction objective, without relying on pretrained image encoders, text, negative examples, or reconstruction. Trained on 2 million videos, these models are evaluated on downstream image and video tasks, demonstrating versatile visual representations that excel in both motion and appearance-based tasks without requiring adaptation of model parameters. The largest model, a ViT-H/16 trained only on videos, achieves impressive performance on Kinetics-400, Something-Something-v2, and ImageNet1K datasets.	Multimodal LLMs
13 Feb 2024	World Model on Million-Length Video and Language with Ring Attention	The paper addresses limitations of current language models by proposing a joint modeling approach with video sequences to enhance understanding of complex, long-form tasks. It curates a large dataset of diverse videos and books, trains transformers with RingAttention technique on long sequences, and gradually increases context size. Key contributions include training one of the largest context size transformers, overcoming vision-language training challenges, and open-sourcing optimized models capable of processing multimodal sequences over 1M tokens. This work enables training on massive datasets to develop understanding of both human knowledge and the multimodal world, paving the way for broader AI capabilities.	Multimodal LLMs
10 Feb 2024	ChemLLM: A Chemical Large Language Model	The paper introduces ChemLLM, the first large language model tailored specifically for chemistry applications, addressing the challenge of integrating structured chemical data into coherent dialogue. Through a template-based instruction construction method, ChemLLM transforms structured knowledge into plain dialogue for effective language model training. ChemLLM outperforms GPT-3.5 and GPT-4 on key chemistry tasks such as name conversion, molecular captioning, and reaction prediction, demonstrating exceptional adaptability to related mathematical and physical tasks. Moreover, ChemLLM showcases proficiency in specialized NLP tasks within chemistry, opening new avenues for exploration in chemical studies.	Task Specific LLMs
6 Feb 2024	LLM Agents can Autonomously Hack Websites	The paper demonstrates that LLMs, particularly GPT-4, possess the capability to autonomously conduct website hacking tasks such as blind database schema extraction and SQL injections without prior knowledge of vulnerabilities. This ability, enabled by advanced models adept at tool usage and leveraging extended context, raises concerns about the potential offensive capabilities of LLM agents and prompts questions regarding their widespread deployment in cybersecurity contexts.	LLM Agents
6 Feb 2024	AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls	The paper introduces AnyTool, a large language model agent designed to enhance the utilization of over 16,000 APIs sourced from Rapid API to address user queries efficiently. AnyTool comprises an API retriever, a solver for query resolution, and a self-reflection mechanism. Powered by the function calling feature of GPT-4, AnyTool eliminates the need for external module training. Additionally, the paper revises the evaluation protocol to introduce AnyToolBench, demonstrating superior performance over strong baselines such as ToolLLM and a GPT-4 variant tailored for tool utilization across various datasets. The code is available at https://github.com/dyabel/AnyTool.	LLM Agents
6 Feb 2024	Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning	The paper introduces a novel Indirect Reasoning (IR) method to enhance the reasoning capabilities of LLMs beyond the limitations of Direct Reasoning (DR) frameworks like Chain-of-Thought and Self-Consistency. By leveraging logic of contrapositives and contradictions, the IR method tackles tasks such as factual reasoning and mathematical proof. Experimental results on popular LLMs, including GPT-3.5-turbo and Gemini-pro, demonstrate a substantial improvement in accuracy for both factual reasoning and mathematical proof compared to traditional DR methods. Combining IR with DR further enhances performance, underscoring the effectiveness of the proposed strategy.	Prompt Engineering
6 Feb 2024	Self-Discover: Large Language Models Self-Compose Reasoning Structures	The paper introduces SELF-DISCOVER, a framework for LLMs to autonomously identify task-specific reasoning structures, improving performance on challenging reasoning benchmarks like BigBench-Hard and MATH. By selecting and composing atomic reasoning modules during a self-discovery process, SELF-DISCOVER enhances reasoning abilities, surpassing models like GPT-4 and PaLM 2 by up to 32% compared to traditional methods like Chain of Thought (CoT). Notably, it outperforms inference-intensive methods like CoT-Self-Consistency by over 20%, with significantly lower inference compute requirements, while exhibiting universality across different LLM model families and echoing human reasoning patterns.	Prompt Engineering
6 Feb 2024	DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models	The paper introduces DeepSeekMath 7B, a model designed to tackle mathematical reasoning challenges by continuing pretraining with a large dataset sourced from Common Crawl. Achieving a notable score of 51.7% on the MATH benchmark without external toolkits, it approaches the performance of advanced models like Gemini-Ultra and GPT-4. DeepSeekMath's success is attributed to leveraging web data via a sophisticated data selection pipeline and employing Group Relative Policy Optimization (GRPO) to enhance mathematical reasoning while optimizing memory usage.	Task Specific LLMs
4 Feb 2024	Large Language Model for Table Processing: A Survey	The survey provides a comprehensive overview of table-centric tasks and the utilization of LLMs to automate them, including traditional areas like Table QA and fact verification, as well as newer aspects such as table manipulation and advanced data analysis. It delves into recent paradigms in LLM usage, focusing on instruction-tuning, prompting, and agent-based approaches. The paper also addresses challenges like private deployment, efficient inference, and the need for extensive benchmarks in table manipulation and advanced data analysis.	Task Specific LLMs
3 Feb 2024	More Agents Is All You Need	The paper demonstrates that the performance of LLMs can be improved by scaling the number of instantiated agents using a simple sampling-and-voting method. This method is independent of existing complex enhancement techniques and its effectiveness correlates with task difficulty. Extensive experiments across various LLM benchmarks validate this finding and explore associated properties. The code for the experiments is publicly accessible on Git.	LLM Agents
1 Feb 2024	OLMo: Accelerating the Science of Language Models	OLMo aims to accelerate the science of language models by providing a platform for rapid experimentation and understanding of LLMs. It offers tools for model training, fine-tuning, and evaluation, alongside a collaborative environment for researchers. The goal is to facilitate discoveries and advancements in LLM technology, making it more accessible to a wider audience.	Open-Source LLMs

🎓 Courses

[Ongoing] Applied LLMs Mastery 2024

Join 1000+ students on this 10-week adventure as we delve into the application of LLMs across a variety of use cases

Link to the course website

[Feb 2024] Registrations are still open click here to register

🗓️Week 1 [Jan 15 2024]: Practical Introduction to LLMs

Applied LLM Foundations
Real World LLM Use Cases
Domain and Task Adaptation Methods

🗓️Week 2 [Jan 22 2024]: Prompting and Prompt Engineering

Basic Prompting Principles
Types of Prompting
Applications, Risks and Advanced Prompting

🗓️Week 3 [Jan 29 2024]: LLM Fine-tuning

Basics of Fine-Tuning
Types of Fine-Tuning
Fine-Tuning Challenges

🗓️Week 4 [Feb 5 2024]: RAG (Retrieval-Augmented Generation)

Understanding the concept of RAG in LLMs
Key components of RAG
Advanced RAG Methods

🗓️Week 5 [ Feb 12 2024]: Tools for building LLM Apps

Fine-tuning Tools
RAG Tools
Tools for observability, prompting, serving, vector search etc.

🗓️Week 6 [Feb 19 2024]: Evaluation Techniques

Types of Evaluation
Common Evaluation Benchmarks
Common Metrics

🗓️Week 7 [Feb 26 2024]: Building Your Own LLM Application

Components of LLM application
Build your own LLM App end to end

🗓️Week 8 [March 4 2024]: Advanced Features and Deployment

LLM lifecycle and LLMOps
LLM Monitoring and Observability
Deployment strategies

🗓️Week 9 [March 11 2024]: Challenges with LLMs

Scaling Challenges
Behavioral Challenges
Future directions

🗓️Week 10 [March 18 2024]: Emerging Research Trends

Smaller and more performant models
Multimodal models
LLM Alignment

🗓️Week 11 Bonus [March 25 2024]: Foundations

Generative Models Foundations
Self-Attention and Transformers
Neural Networks for Language

📖 List of Free GenAI Courses

📎 Resources

ICLR 2024 Paper Summaries

💻 Interview Prep

Topic wise Questions:

Common GenAI Interview Questions
Prompting and Prompt Engineering
Model Fine-Tuning
Model Evaluation
MLOps for GenAI
Generative Models Foundations
Latest Research Trends

GenAI System Design (Coming Soon):

Designing an LLM-Powered Search Engine
Building a Customer Support Chatbot
Building a system for natural language interaction with your data.
Building an AI Co-pilot
Designing a Custom Chatbot for Q/A on Multimodal Data (Text, Images, Tables, CSV Files)
Building an Automated Product Description and Image Generation System for E-commerce

📓 Code Notebooks

RAG Tutorials

AWS Bedrock Workshop Tutorials by Amazon Web Services
Langchain Tutorials by gkamradt
LLM Applications for production by ray-project
LLM tutorials by Ollama
LLM Hub by mallahyari

Fine-Tuning Tutorials

LLM Fine-tuning tutorials by ashishpatel26
PEFT example notebooks by Huggingface
Free LLM Fine-Tuning Notebooks by Youssef Hosni

✒️ Contributing

If you want to add to the repository or find any issues, please feel free to raise a PR and ensure correct placement within the relevant section or category.

📌 Cite Us

To cite this guide, use the below format:

@article{areganti_generative_ai_guide,
author = {Reganti, Aishwarya Naresh},
journal = {https://github.com/aishwaryanr/awesome-generative-ai-resources},
month = {01},
title = {{Generative AI Guide}},
year = {2024}
}

License

[MIT License]

shankch/awesome-generative-ai-guide