Generative AI is experiencing rapid growth, and this repository serves as a comprehensive hub for updates on generative AI research, interview materials, notebooks, and more!
Explore the following resources:
- Monthly Best GenAI Papers List
- GenAI Interview Resources
- Applied LLMs Mastery 2024 (created by Aishwarya Naresh Reganti) course material
- List of all GenAI-related free courses (over 30 already listed)
- List of code repositories/notebooks for developing generative AI applications
We'll be updating this repository regularly, so keep an eye out for the latest additions!
Happy Learning!
- Applied LLMs Mastery full course content has been released!!! (Click Here)
- 5-day roadmap to learn LLM foundations out now! (Click Here)
- 60 Common GenAI Interview Questions out now! (Click Here)
- ICLR 2024 paper summaries (Click Here)
- List of free GenAI courses (Click Here)
*Updated at the end of every month
Date | Name | Summary | Topics |
---|---|---|---|
28 Feb 2024 | Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models | The paper provides a thorough analysis of Sora, a text-to-video generative AI model launched by OpenAI. It examines Sora's evolution, underlying technologies, diverse applications across industries, and potential impact on creativity and productivity. Challenges like safety and bias in video generation are discussed, along with future directions for Sora and similar models, envisioning enhanced human-AI collaboration and innovation in video production. Note that this paper is not written by the creators of Sora, it is reverse engineered by a group of researchers. | Multimodal Models |
28 Feb 2024 | OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement | The paper introduces OpenCodeInterpreter, a family of open-source code systems aimed at addressing the limitations of existing open-source models in code generation by incorporating execution capabilities and iterative refinement similar to advanced systems like the GPT-4 Code Interpreter. Leveraging the CodeFeedback dataset, which includes 68K multi-turn interactions, OpenCodeInterpreter integrates execution and human feedback for dynamic code refinement. Evaluation across key benchmarks demonstrates exceptional performance, with OpenCodeInterpreter33B achieving close accuracy to GPT-4 on HumanEval and MBPP benchmarks, effectively bridging the gap between open-source and proprietary code generation systems. | Task Specific LLMs, Evaluation |
27 Feb 2024 | Evaluating Very Long-Term Conversational Memory of LLM Agents | This paper introduces a machine-human pipeline to generate high-quality, very long-term dialogues, spanning up to 35 sessions, using large language models and retrieval augmented generation techniques. The conversations are grounded on personas and temporal event graphs, with each agent capable of sharing and reacting to images. The resulting dataset, LOCOMO, comprises conversations with 300 turns on average. Evaluation benchmarks measure long-term memory in models, revealing challenges for LLMs in understanding lengthy conversations and comprehending long-range temporal dynamics. While strategies like long-context LLMs or RAG show improvements, models still lag behind human performance. | RAG, Benchmark, Long Context |
27 Feb 2024 | When Scaling Meets LLM Finetuning: The Effect of Data, Model, and Finetuning Method | This paper investigates the scaling properties of different finetuning methods for LLMs. Through systematic experiments, it explores the impact of various scaling factors, including model size, pretraining data size, and finetuning data size, on finetuning performance. Results suggest a power-based multiplicative joint scaling law between finetuning data size and other factors, with LLM model scaling offering more benefits than pretraining data scaling. Additionally, the optimal finetuning method varies depending on the task and finetuning data. These findings aim to enhance understanding and development of LLM finetuning methods. | Fine-Tuning |
27 Feb 2024 | The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits | This paper introduces BitNet b1.58, a variant where every parameter is ternary {-1, 0, 1}, matching full-precision Transformer LLMs in both perplexity and end-task performance while offering significant cost-effectiveness. This 1.58-bit LLM sets a new standard for high-performance, cost-effective models and opens opportunities for new computation paradigms and hardware designs optimized for 1-bit LLMs. | Cost Effective LLMs |
26 Feb 2024 | Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts | The paper introduces Rainbow Teaming, a method for diversely generating adversarial prompts to enhance the robustness of LLMs. By framing prompt generation as a quality-diversity problem, it uncovers vulnerabilities across various domains, including safety, question answering, and cybersecurity. Additionally, fine-tuning LLMs on synthetic data produced by Rainbow Teaming improves safety without compromising general capabilities, offering a path to open-ended self-improvement. | Red-Teaming |
26 Feb 2024 | Do Large Language Models Latently Perform Multi-Hop Reasoning? | The study investigates whether LLMs engage in latent multi-hop reasoning when processing complex prompts. By analyzing individual hops and their co-occurrence, the research examines how LLMs identify and utilize bridge entities to complete prompts. Results show strong evidence of latent multi-hop reasoning in certain relation types, with the reasoning pathway used in over 80% of prompts. However, the utilization varies contextually, and while evidence for the first hop is substantial, it's more moderate for the second hop. Additionally, there's a scaling trend with increasing model size for the first hop but not the second, indicating potential challenges and opportunities for future LLM development. | Evaluation |
26 Feb 2024 | Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs | The paper discusses Reinforcement Learning from Human Feedback (RLHF) as vital for large language model LLM alignment. While Proximal Policy Optimization (PPO) is commonly used, its high computational cost and hyperparameter sensitivity pose challenges. The study proposes simpler REINFORCE-style optimization variants for RLHF, showing superior performance compared to PPO and other methods like DPO and RAFT. It suggests that adapting to LLM alignment characteristics allows for efficient online RL optimization. | Instruction Tuning |
23 Feb 2024 | A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts | The paper introduces ReadAgent, an innovative LLM system that significantly extends effective context length, up to 20 times in experiments. Mimicking human reading, ReadAgent strategically stores and compresses content into "gist memories," enabling efficient retrieval when needed. Evaluation on long-document reading tasks demonstrates ReadAgent's superiority over baselines, enhancing performance while expanding the effective context window by 3 to 20 times. | LLM Agents |
22 Feb 2024 | MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases | The paper addresses the need for efficient LLMs on mobile devices by focusing on models with fewer than a billion parameters. Contrary to the belief that data and parameter quantity determine model quality, the study emphasizes the significance of model architecture. Introducing MobileLLM, leveraging deep and thin architectures, the model demonstrates notable accuracy boosts over previous state-of-the-art models. Additionally, MobileLLM-LS, incorporating block-wise weight sharing, further enhances accuracy with marginal latency overhead, highlighting the potential of small models for on-device use cases. | Smaller Models |
22 Feb 2024 | Stable Diffusion 3 | Stability.ai announced the early preview of Stable Diffusion 3, their latest text-to-image model, boasting significant improvements in multi-subject prompts, image quality, and spelling abilities. The waitlist for early access is now open, allowing users to contribute insights for enhancing performance and safety prior to its public release. Ranging from 800M to 8B parameters, the suite offers scalability options to cater to various creative needs.Emphasizing safe and responsible AI practices, Stability.ai has implemented numerous safeguards and continues to collaborate with researchers and experts to ensure integrity throughout development and deployment. | Multimodal Models |
21 Feb 2024 | Coercing Large Language Models (LLMs) to Do and Reveal (Almost) Anything | The paper expands the scope of adversarial attacks on LLMs beyond "jailbreaking," highlighting various attack surfaces and goals. Through concrete examples, it categorizes attacks inducing unintended behaviors like misdirection, model control, denial-of-service, and data extraction. Controlled experiments reveal many attacks originate from pre-training with coding capabilities and the presence of "glitch" tokens in LLM vocabularies, emphasizing the need for security measures. | Red-Teaming |
21 Feb 2024 | LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens | The paper introduces LongRoPE, which extends the context window of pre-trained LLMs to an impressive 2048k tokens, overcoming limitations of current extended context windows. Key innovations include exploiting non-uniformities in positional interpolation, a progressive extension strategy, and readjusting to recover short context window performance. Extensive experiments demonstrate the effectiveness of LongRoPE across various tasks, with models retaining the original architecture and minor modifications to positional embedding. | Long Context, Embedding |
21 Feb 2024 | Gemma: Open Models Based on Gemini Research and Technology | This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations | Foundation LLMs |
21 Feb 2024 | Large Language Models for Data Annotation: A Survey | The paper focuses on leveraging advanced LLMs, like GPT-4, for automating data annotation, a labor-intensive process in machine learning. It offers insights into LLM-Based Data Annotation, Assessing LLM-generated Annotations, and Learning with LLM-generated annotations. The survey includes a taxonomy of methodologies, reviews learning strategies, and discusses challenges and limitations. Aimed at guiding researchers and practitioners, it aims to foster advancements in data annotation using the latest LLMs. | Task Specific LLMs |
21 Feb 2024 | In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss | The paper introduces BABILong, a benchmark designed to evaluate the processing capabilities of generative transformer models on long documents. While common methods are effective only for sequences up to 10^4 elements, fine-tuning GPT-2 with recurrent memory augmentations enables it to handle tasks involving up to 11 × 10^6 elements. This achievement represents a substantial leap, demonstrating significant improvement in processing capabilities for long sequences and marking the longest input processed by any neural network model to date. | Benchmark, Long Context |
20 Feb 2024 | Large Language Models: A Survey | The paper provides a comprehensive review of LLMs since the release of ChatGPT in November 2022. It discusses prominent LLM families (GPT, LLaMA, PaLM), their characteristics, contributions, and limitations, along with techniques for building and augmenting LLMs. Additionally, it surveys datasets, evaluation metrics, and performance comparisons of popular LLMs on representative benchmarks. The paper concludes by highlighting open challenges and future research directions in the field of LLMs. | Survey of LLMs |
19 Feb 2024 | LongAgent: Scaling Language Models to 128k Context Through Multi-Agent Collaboration | The paper introduces LongAgent, a method employing multi-agent collaboration to scale LLMs like LLaMA to process long texts up to 128K tokens. LongAgent utilizes a leader to interpret user intent and coordinate team members in acquiring information. To address hallucination-induced response inaccuracies, an inter-member communication mechanism resolves conflicts through information sharing. Experimental results demonstrate LongAgent's superiority over GPT-4 in tasks such as 128k-long text retrieval and multi-hop question answering. | LLM Agents |
19 Feb 2024 | LoRA+: Efficient Low Rank Adaptation of Large Models | The paper identifies suboptimal fine-tuning in models with large width (embedding dimension) using Low Rank Adaptation (LoRA) due to updating adapter matrices A and B with the same learning rate. By setting different learning rates for A and B with a fixed ratio in a proposed algorithm called LoRA+, the suboptimality of LoRA can be corrected. Extensive experiments demonstrate that LoRA+ improves performance (1% − 2% improvements) and fine-tuning speed (up to ∼ 2X SpeedUp) at the same computational cost as LoRA. | PEFT |
15 Feb 2024 | Generative Representational Instruction Tuning | The paper introduces generative representational instruction tuning (GRIT), enabling a large language model to excel in both generative and embedding tasks by distinguishing between them through instructions. GRITLM 7B sets a new state-of-the-art on the Massive Text Embedding Benchmark (MTEB) and outperforms all models of its size on generative tasks. Scaling up to GRITLM 8X7B further surpasses all open generative language models while remaining among the best embedding models. GRIT unifies generative and embedding training without performance loss, significantly speeding up RAG by over 60% for long documents. Models and code are available. | RAG, Instruction Tuning |
15 Feb 2024 | Chain-of-Thought Reasoning Without Prompting | The study enhances large language models' reasoning abilities without explicit prompting by altering the decoding process to uncover inherent chain-of-thought (CoT) reasoning paths. This method bypasses manual prompt engineering, assesses intrinsic reasoning abilities, and correlates CoT presence with higher confidence in decoded answers. Extensive empirical studies across benchmarks demonstrate significant performance improvement over standard greedy decoding. | Prompt Engineering |
15 Feb 2024 | Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context | The report introduces Gemini 1.5 Pro, a highly efficient multimodal model excelling in recalling and reasoning over vast amounts of context, including long documents and videos. It achieves near-perfect recall across tasks, surpasses previous state-of-the-art models, and exhibits surprising translation abilities for rare languages like Kalamang. | Foundation LLMs |
15 Feb 2024 | Revisiting Feature Prediction for Learning Visual Representations from Video | The paper introduces V-JEPA, a collection of vision models trained solely on video data using a feature prediction objective, without relying on pretrained image encoders, text, negative examples, or reconstruction. Trained on 2 million videos, these models are evaluated on downstream image and video tasks, demonstrating versatile visual representations that excel in both motion and appearance-based tasks without requiring adaptation of model parameters. The largest model, a ViT-H/16 trained only on videos, achieves impressive performance on Kinetics-400, Something-Something-v2, and ImageNet1K datasets. | Multimodal LLMs |
13 Feb 2024 | World Model on Million-Length Video and Language with Ring Attention | The paper addresses limitations of current language models by proposing a joint modeling approach with video sequences to enhance understanding of complex, long-form tasks. It curates a large dataset of diverse videos and books, trains transformers with RingAttention technique on long sequences, and gradually increases context size. Key contributions include training one of the largest context size transformers, overcoming vision-language training challenges, and open-sourcing optimized models capable of processing multimodal sequences over 1M tokens. This work enables training on massive datasets to develop understanding of both human knowledge and the multimodal world, paving the way for broader AI capabilities. | Multimodal LLMs |
10 Feb 2024 | ChemLLM: A Chemical Large Language Model | The paper introduces ChemLLM, the first large language model tailored specifically for chemistry applications, addressing the challenge of integrating structured chemical data into coherent dialogue. Through a template-based instruction construction method, ChemLLM transforms structured knowledge into plain dialogue for effective language model training. ChemLLM outperforms GPT-3.5 and GPT-4 on key chemistry tasks such as name conversion, molecular captioning, and reaction prediction, demonstrating exceptional adaptability to related mathematical and physical tasks. Moreover, ChemLLM showcases proficiency in specialized NLP tasks within chemistry, opening new avenues for exploration in chemical studies. | Task Specific LLMs |
6 Feb 2024 | LLM Agents can Autonomously Hack Websites | The paper demonstrates that LLMs, particularly GPT-4, possess the capability to autonomously conduct website hacking tasks such as blind database schema extraction and SQL injections without prior knowledge of vulnerabilities. This ability, enabled by advanced models adept at tool usage and leveraging extended context, raises concerns about the potential offensive capabilities of LLM agents and prompts questions regarding their widespread deployment in cybersecurity contexts. | LLM Agents |
6 Feb 2024 | AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls | The paper introduces AnyTool, a large language model agent designed to enhance the utilization of over 16,000 APIs sourced from Rapid API to address user queries efficiently. AnyTool comprises an API retriever, a solver for query resolution, and a self-reflection mechanism. Powered by the function calling feature of GPT-4, AnyTool eliminates the need for external module training. Additionally, the paper revises the evaluation protocol to introduce AnyToolBench, demonstrating superior performance over strong baselines such as ToolLLM and a GPT-4 variant tailored for tool utilization across various datasets. The code is available at https://github.com/dyabel/AnyTool. | LLM Agents |
6 Feb 2024 | Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning | The paper introduces a novel Indirect Reasoning (IR) method to enhance the reasoning capabilities of LLMs beyond the limitations of Direct Reasoning (DR) frameworks like Chain-of-Thought and Self-Consistency. By leveraging logic of contrapositives and contradictions, the IR method tackles tasks such as factual reasoning and mathematical proof. Experimental results on popular LLMs, including GPT-3.5-turbo and Gemini-pro, demonstrate a substantial improvement in accuracy for both factual reasoning and mathematical proof compared to traditional DR methods. Combining IR with DR further enhances performance, underscoring the effectiveness of the proposed strategy. | Prompt Engineering |
6 Feb 2024 | Self-Discover: Large Language Models Self-Compose Reasoning Structures | The paper introduces SELF-DISCOVER, a framework for LLMs to autonomously identify task-specific reasoning structures, improving performance on challenging reasoning benchmarks like BigBench-Hard and MATH. By selecting and composing atomic reasoning modules during a self-discovery process, SELF-DISCOVER enhances reasoning abilities, surpassing models like GPT-4 and PaLM 2 by up to 32% compared to traditional methods like Chain of Thought (CoT). Notably, it outperforms inference-intensive methods like CoT-Self-Consistency by over 20%, with significantly lower inference compute requirements, while exhibiting universality across different LLM model families and echoing human reasoning patterns. | Prompt Engineering |
6 Feb 2024 | DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models | The paper introduces DeepSeekMath 7B, a model designed to tackle mathematical reasoning challenges by continuing pretraining with a large dataset sourced from Common Crawl. Achieving a notable score of 51.7% on the MATH benchmark without external toolkits, it approaches the performance of advanced models like Gemini-Ultra and GPT-4. DeepSeekMath's success is attributed to leveraging web data via a sophisticated data selection pipeline and employing Group Relative Policy Optimization (GRPO) to enhance mathematical reasoning while optimizing memory usage. | Task Specific LLMs |
4 Feb 2024 | Large Language Model for Table Processing: A Survey | The survey provides a comprehensive overview of table-centric tasks and the utilization of LLMs to automate them, including traditional areas like Table QA and fact verification, as well as newer aspects such as table manipulation and advanced data analysis. It delves into recent paradigms in LLM usage, focusing on instruction-tuning, prompting, and agent-based approaches. The paper also addresses challenges like private deployment, efficient inference, and the need for extensive benchmarks in table manipulation and advanced data analysis. | Task Specific LLMs |
3 Feb 2024 | More Agents Is All You Need | The paper demonstrates that the performance of LLMs can be improved by scaling the number of instantiated agents using a simple sampling-and-voting method. This method is independent of existing complex enhancement techniques and its effectiveness correlates with task difficulty. Extensive experiments across various LLM benchmarks validate this finding and explore associated properties. The code for the experiments is publicly accessible on Git. | LLM Agents |
1 Feb 2024 | OLMo: Accelerating the Science of Language Models | OLMo aims to accelerate the science of language models by providing a platform for rapid experimentation and understanding of LLMs. It offers tools for model training, fine-tuning, and evaluation, alongside a collaborative environment for researchers. The goal is to facilitate discoveries and advancements in LLM technology, making it more accessible to a wider audience. | Open-Source LLMs |
Join 1000+ students on this 10-week adventure as we delve into the application of LLMs across a variety of use cases
Link to the course website
[Feb 2024] Registrations are still open click here to register
🗓️Week 1 [Jan 15 2024]: Practical Introduction to LLMs
- Applied LLM Foundations
- Real World LLM Use Cases
- Domain and Task Adaptation Methods
🗓️Week 2 [Jan 22 2024]: Prompting and Prompt Engineering
- Basic Prompting Principles
- Types of Prompting
- Applications, Risks and Advanced Prompting
🗓️Week 3 [Jan 29 2024]: LLM Fine-tuning
- Basics of Fine-Tuning
- Types of Fine-Tuning
- Fine-Tuning Challenges
🗓️Week 4 [Feb 5 2024]: RAG (Retrieval-Augmented Generation)
- Understanding the concept of RAG in LLMs
- Key components of RAG
- Advanced RAG Methods
🗓️Week 5 [ Feb 12 2024]: Tools for building LLM Apps
- Fine-tuning Tools
- RAG Tools
- Tools for observability, prompting, serving, vector search etc.
🗓️Week 6 [Feb 19 2024]: Evaluation Techniques
- Types of Evaluation
- Common Evaluation Benchmarks
- Common Metrics
🗓️Week 7 [Feb 26 2024]: Building Your Own LLM Application
- Components of LLM application
- Build your own LLM App end to end
🗓️Week 8 [March 4 2024]: Advanced Features and Deployment
- LLM lifecycle and LLMOps
- LLM Monitoring and Observability
- Deployment strategies
🗓️Week 9 [March 11 2024]: Challenges with LLMs
- Scaling Challenges
- Behavioral Challenges
- Future directions
🗓️Week 10 [March 18 2024]: Emerging Research Trends
- Smaller and more performant models
- Multimodal models
- LLM Alignment
🗓️Week 11 Bonus [March 25 2024]: Foundations
- Generative Models Foundations
- Self-Attention and Transformers
- Neural Networks for Language
-
Large Language Models by ETH Zurich
-
Understanding Large Language Models by Princeton
-
Transformers course by Huggingface
-
NLP course by Huggingface
-
CS324 - Large Language Models by Stanford
-
Generative AI with Large Language Models by Coursera
-
Introduction to Generative AI by Coursera
-
Generative AI Fundamentals by Google Cloud
-
Introduction to Large Language Models by Google Cloud
-
Introduction to Generative AI by Google Cloud
-
Generative AI Concepts by DataCamp (Daniel Tedesco Data Lead @ Google)
-
1 Hour Introduction to LLM (Large Language Models) by WeCloudData
-
LLMOps: Building Real-World Applications With Large Language Models by Udacity
-
Full Stack LLM Bootcamp by FSDL
-
Generative AI for beginners by Microsoft
-
Large Language Models: Application through Production by Databricks
-
Generative AI Foundations by AWS
-
LLM University by Cohere
-
LLM Learning Lab by Lightning AI
-
Functions, Tools and Agents with LangChain by Deeplearning.AI
-
LangChain for LLM Application Development by Deeplearning.AI
-
LLMOps by DeepLearning.AI
-
Automated Testing for LLMOps by DeepLearning.AI
-
LangChain & Vector Databases in Production by Activeloop
-
Reinforcement Learning from Human Feedback by DeepLearning.AI
-
Building Applications with Vector Databases by DeepLearning.AI
-
How Diffusion Models Work by DeepLearning.AI
-
Finetuning Large Language Models by Deeplearning.AI
-
LangChain: Chat with Your Data by Deeplearning.AI
-
Building Systems with the ChatGPT API by Deeplearning.AI
-
Building Applications with Vector Databases by Deeplearning.AI
-
ChatGPT Prompt Engineering for Developers by Deeplearning.AI
-
Advanced RAG Orchestration series by LlamaIndex
-
Building and Evaluating Advanced RAG Applications by DeepLearning.AI
-
Evaluating and Debugging Generative AI Models Using Weights and Biases by Deeplearning.AI
- Common GenAI Interview Questions
- Prompting and Prompt Engineering
- Model Fine-Tuning
- Model Evaluation
- MLOps for GenAI
- Generative Models Foundations
- Latest Research Trends
- Designing an LLM-Powered Search Engine
- Building a Customer Support Chatbot
- Building a system for natural language interaction with your data.
- Building an AI Co-pilot
- Designing a Custom Chatbot for Q/A on Multimodal Data (Text, Images, Tables, CSV Files)
- Building an Automated Product Description and Image Generation System for E-commerce
- AWS Bedrock Workshop Tutorials by Amazon Web Services
- Langchain Tutorials by gkamradt
- LLM Applications for production by ray-project
- LLM tutorials by Ollama
- LLM Hub by mallahyari
- LLM Fine-tuning tutorials by ashishpatel26
- PEFT example notebooks by Huggingface
- Free LLM Fine-Tuning Notebooks by Youssef Hosni
If you want to add to the repository or find any issues, please feel free to raise a PR and ensure correct placement within the relevant section or category.
To cite this guide, use the below format:
@article{areganti_generative_ai_guide,
author = {Reganti, Aishwarya Naresh},
journal = {https://github.com/aishwaryanr/awesome-generative-ai-resources},
month = {01},
title = {{Generative AI Guide}},
year = {2024}
}
[MIT License]