/awesome-genai-models

Let's go on a journey to find and understand all the Generative AI Models together.

Generative AI Models

Open Source Models


Text Generation

Model Created By Size Description Link
Arctic (Dense-MoE) Snowflake 480B Active 17B Arctic is a dense-MoE Hybrid transformer architecture pre-trained from scratch. Arctic combines a 10B dense transformer model with a residual 128x3.66B MoE MLP resulting in 480B total and 17B active parameters chosen using a top-2 gating. HuggingFace Github Blog
LLama 3 Meta AI 8B 70B Llama 3 is a family of large language models, a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. It is an auto-regressive language model that uses an optimizehttps://github.com/Snowflake-Labs/snowflake-arcticd transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF). HuggingFace Blog Github
Phi 3 Microsoft 3.8B Phi-3-Mini is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. This dataset includes both synthetic data and publicly available website data, with an emphasis on high-quality and reasoning-dense properties. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, HuggingFace Blog
OpenELM Apple 270M 450M 1.1B 3B OpenELM, a family of Open-source Efficient Language Models. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. Trained on RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. Released both pretrained and instruction tuned models with 270M, 450M, 1.1B and 3B parameters. HuggingFace OpenELM HuggingFace OpenELM-Instruct
Mixtral 8x22B (MoE) Mistral AI 176B Active 40B Mixtral-8x22B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. It has contect length of 65,000 tokens. HuggingFace Blog
Command-R+ Cohere 104B C4AI Command R+ is an open weights research release of a 104B billion parameter model with highly advanced capabilities, this includes Retrieval Augmented Generation (RAG) and tool use to automate sophisticated tasks. Command R+ is optimized for a variety of use cases including reasoning, summarization, and question answering. Hugging Face
Jamba (MoE) AI21 labs 52B active 12B Jamba is a state-of-the-art, hybrid SSM-Transformer LLM. It delivers throughput gains over traditional Transformer-based models. It’s a pretrained, mixture-of-experts (MoE) generative text model, with 12B active parameters and a total of 52B parameters across all experts. It supports a 256K context length, and can fit up to 140K tokens on a single 80GB GPU. HuggingFace Blog
DBRX (MoE) Databricks 132B Active 36B DBRX is a transformer-based decoder-only large language model (LLM) that was trained using next-token prediction. It uses a fine-grained mixture-of-experts (MoE) architecture with 132B total parameters of which 36B parameters are active on any input. It was pre-trained on 12T tokens of text and code data. Compared to other open MoE models like Mixtral-8x7B and Grok-1, DBRX is fine-grained, meaning it uses a larger number of smaller experts. DBRX has 16 experts and chooses 4, while Mixtral-8x7B and Grok-1 have 8 experts and choose 2. This provides 65x more possible combinations of experts which improves model quality. HuggingFace Github Blog
Grok 1.0 (MoE) xAI 314B Grok 1.0 uses Mixture of 8 Experts (MoE). Grok 1.0 is not fine-tuned for specific applications like dialogue but showcases strong performance compared to other models like GPT-3.5 and Llama 2. It is larger than GPT-3/3.5. Github HuggingFace
Gemma Google 2B 7B Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. HuggingFace Kaggle Github Blog
Recurrent Gemma Google 2B RecurrentGemma is a family of open language models built on a novel recurrent architecture. Like Gemma, RecurrentGemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Because of its novel architecture, RecurrentGemma requires less memory than Gemma and achieves faster inference when generating long sequences. HuggingFace Kaggle
Mixtral 8x7B (MoE) Mistral AI 45B Active 12B Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks. HuggingFace Kaggle Blog
Qwen1.5-MoE (MoE) Alibaba 14.3B Active 2.7B Qwen1.5-MoE is a transformer-based MoE decoder-only language model pretrained on a large amount of data. It employs Mixture of Experts (MoE) architecture, where the models are upcycled from dense language models. It has 14.3B parameters in total and 2.7B activated parameters during runtime, while achieching comparable performance to Qwen1.5-7B, it only requires 25% of the training resources. HuggingFace
Mistral 7B Mistral AI 7B The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mistral-7B-v0.1 outperforms Llama 2 13B on most benchmarks. Github HuggingFace Kaggle Blog
Mistral 7B v2 Mistral AI 7B Mistral 7B v2 has the following changes compared to Mistral 7B:- 32k context window (vs 8k context in v0.1), Rope-theta = 1e6, No Sliding-Window Attention. HuggingFace
Llama 2 Meta AI 7B 13B 70B Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. It is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety. HuggingFace Kaggle Github Blog
Dolly v2 Databricks 3B 7B 12B Dolly v2 is a causal language model created by Databricks that is derived from EleutherAI's Pythia-12b and fine-tuned on a ~15K record instruction corpus. HuggingFace Dolly3B HuggingFace Dolly7B HuggingFace Dolly12B Kaggle Github
Command-R Cohere 35B Command-R is a research release of a 35 billion parameter highly performant generative model. Command-R is a large language model with open weights optimized for a variety of use cases including reasoning, summarization, and question answering. Command-R has the capability for multilingual generation evaluated in 10 languages and highly performant RAG capabilities. HuggingFace Kaggle
Qwen1.5 Alibaba 0.5B 1.8B 4B 7B 14B 32B 72B Qwen1.5 is a transformer-based decoder-only language model pretrained on a large amount of data. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention, etc. HuggingFace Github
Vicuna v1.5 Lysms 7B 13B Vicuna v1.5 is fine-tuned from Llama 2 with supervised instruction fine-tuning. The training data is around 125K conversations collected from ShareGPT.com. The primary use of Vicuna is research on large language models and chatbots. HuggingFace Vicuna7B HuggingFace Vicuna13B
Phi 2 Microsoft 2.7B Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites. When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-2 showcased a nearly state-of-the-art performance among models with less than 13 billion parameters. HuggingFace Kaggle Blog
Orca 2 Microsoft 7B 13B Orca 2 is built for research purposes only and provides a single turn response in tasks such as reasoning over user given data, reading comprehension, math problem solving and text summarization. The model is designed to excel particularly in reasoning. The model is not optimized for chat and has not been trained with RLHF or DPO. HuggingFace Blog
Smaug Abacus AI 34B 72B Smaug is created using a new fine-tuning technique, DPO-Positive (DPOP), and new pairwise preference versions of ARC, HellaSwag, and MetaMath (as well as other existing datasets). HuggingFace
MPT Mosaicml 1B 7B 30B MPT is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code. These models use a modified transformer architecture optimized for efficient training and inference. These architectural changes include performance-optimized layer implementations and the elimination of context length limits by replacing positional embeddings with Attention with Linear Biases (ALiBi). HuggingFace Kaggle Github
Falcon TLL 7B 40B 180B Falcon is a 7B/40B/180B parameters causal decoder-only models built by TII and trained on 1,000B/1,500B/3,500B tokens of RefinedWeb enhanced with curated corpora. HuggingFace
Yalm Yandex 100B YaLM 100B is a GPT-like neural network for generating and processing text. It is trained on a cluster of 800 A100 graphics cards over 65 days. It is designed for text generation and processing. HuggingFace Github
DeciLM DeciAI 6B 7B DeciLM is a decoder-only text generation model. With support for an 8K-token sequence length, this highly efficient model uses variable Grouped-Query Attention (GQA) to achieve a superior balance between accuracy and computational efficiency. HuggingFace
BERT Google 110M to 350M BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labeling them in any way with an automatic process to generate inputs and labels from those texts. HuggingFace Kaggle GitHub
Olmo AllenAI 1B 7B OLMo is a series of Open Language Models designed to enable the science of language models. The OLMo models are trained on the Dolma dataset. HuggingFace Github
Openchat3.5 Openchat 7B Openchat2.5 is the best performing 7B LLM. HuggingFace Github
Bloom BigScience 176B BLOOM is an autoregressive Large Language Model (LLM), trained to continue text from a prompt on vast amounts of text data using industrial-scale computational resources. HuggingFace
Hermes 2 Pro Mistral Nous Research 7B Hermes 2 Pro on Mistral 7B is the new flagship 7B Hermes. Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house. This new version of Hermes maintains its excellent general task and conversation capabilities - but also excels at Function Calling, JSON Structured Outputs. HuggingFace
Hermes 2 Mixtral 7x8B (MoE) Nous Research Active 12B Nous Hermes 2 Mixtral 8x7B DPO is the new flagship Nous Research model trained over the Mixtral 8x7B MoE LLM. The model was trained on over 1,000,000 entries of primarily GPT-4 generated data, as well as other high quality data from open datasets across the AI landscape, achieving state of the art performance on a variety of tasks. This is the SFT + DPO version of Mixtral Hermes 2. HuggingFace
Merlinite IBM 7B Merlinite-7b is a Mistral-7b-derivative model trained with the LAB methodology, using Mixtral-8x7b-Instruct as a teacher model. HuggingFace
Labradorite IBM 13B Labradorite-13b is a LLaMA-2-13b-derivative model trained with the LAB methodology, using Mixtral-8x7b-Instruct as a teacher model. HuggingFace
Xgen Salesforce 7B Xgen is a Large Language Model that have a context length of 8K, 4K and are optimised for long sequence tasks. HuggingFace Github
Solar Upstage 10.7B SOLAR-10.7B, an advanced large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. It's compact, yet remarkably powerful, and demonstrates unparalleled state-of-the-art performance in models with parameters under 30B. HuggingFace
GPT-Neox Eleuther AI 20B GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile using the GPT-NeoX library. Its architecture intentionally resembles that of GPT-3, and is almost identical to that of GPT-J-6B. HuggingFace GitHub
Flan-T5 Google 80M to 11B FLAN-T5 is modified version of T5 and has same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages. Various Sizes:- flan-t5-small, flan-t5-base, flan-t5-large, flan-t5-xxl HuggingFace Kaggle
OPT Meta AI 125M to 175B OPT are decoder-only pre-trained transformers ranging from 125M to 175B parameters. It was predominantly pretrained with English text but a small amount of non-English data is still present within the training corpus via CommonCrawl. HuggingFace
Stable LM 2 Stability AI 1.6B 12B Stable LM 2 are decoder-only language models pre-trained on 2 trillion tokens of diverse multilingual and code datasets for two epochs. HuggingFace
Stable LM Zephyr Stability AI 3B StableLM Zephyr 3B model is an auto-regressive language model based on the transformer decoder architecture. StableLM Zephyr 3B is a 3 billion parameter that was trained on a mix of publicly available datasets and synthetic datasets using Direct Preference Optimization (DPO). HuggingFace
Aya Cohere 13B The Aya model is a transformer style autoregressive massively multilingual generative language model that follows instructions in 101 languages. It has same architecture as mt5-xxl. HuggingFace Kaggle Blog
Nemotron 3 Nvidia 8B Nemotron-3 are large language foundation models for enterprises to build custom LLMs. This foundation model has 8 billion parameters, and supports a context length of 4,096 tokens. Nemotron-3 is a family of enterprise ready generative text models compatible with NVIDIA NeMo Framework. HuggingFace
Neural Chat v3 Intel 7B Neural Chat is a fine-tuned 7B parameter LLM on the Intel Gaudi 2 processor from the mistralai/Mistral-7B-v0.1 on the open source dataset Open-Orca/SlimOrca. The model was aligned using the Direct Performance Optimization (DPO) method. HuggingFace
Yi 01 AI 6B 9B 34B The Yi series models are the next generation of open-source large language models. They are targeted as a bilingual language model and trained on 3T multilingual corpus, showing promise in language understanding, commonsense reasoning, reading comprehension, and more. HuggingFace Github
Starling LM Nexusflow 7B Starling LM, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling LM is trained from Openchat-3.5-0106 with our new reward model Starling-RM-34B and policy optimization method Fine-Tuning Language Models from Human Preferences (PPO). HuggingFace
NexusRaven v2 Nexusflow 13B NexusRaven is an open-source and commercially viable function calling LLM that surpasses the state-of-the-art in function calling capabilities. NexusRaven-V2 is capable of generating deeply nested function calls, parallel function calls, and simple single calls. It can also justify the function calls it generated. HuggingFace
DeepSeek LLM Deepseek AI 7B 67B DeepSeek LLM is an advanced language model. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. HuggingFace Github
Deepseek VL (Multimodal) Deepseek AI 1.3B 7B DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios. It is a hybrid vision encoder supporting 1024 x 1024 image input and is constructed based on the DeepSeek-7b-base which is trained on an approximate corpus of 2T text tokens. HuggingFace Github
Llava 1.6 (Multimodal) Llava HF 7B 13B 34B LLaVa combines a pre-trained large language model with a pre-trained vision encoder for multimodal chatbot use cases. Available models:- Llava-v1.6-34b-hf, Llava-v1.6-Mistral-7b-hf, Llava-v1.6-Vicuna-7b-hf, Llava-v1.6-vicuna-13b-hf Hugging Face HuggingFace
Yi VL (Multimodal) 01 AI 6B 34B Yi-VL model is the open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images. HuggingFace YiVL6B HuggingFace YiVL34B

Code Generation

Model Created By Size Description Link
CodeQwen1.5 Alibaba 7B CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes and have a context length of 64K tokens. Supporting 92 coding languages. Excellent performance in text-to-SQL, bug fix etc. HuggingFace
CodeGemma Google 2B 7B CodeGemma is a collection of lightweight open code models built on top of Gemma. CodeGemma models are text-to-text and text-to-code decoder-only models. CodeGemma 2B and 7B are further trained on an additional 500 billion tokens of primarily English language data from publicly available code repositories, open source mathematics datasets and synthetically generated code. HuggingFace Kaggle
CodeLlama Meta AI 7B 13B 34B 70B Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. HuggingFace Kaggle Github
Starcoder BigCode 15.5B The StarCoder models are 15.5B parameter models trained on 80+ programming languages from The Stack (v1.2), with opt-out requests excluded. The model uses Multi Query Attention, a context window of 8192 tokens, and was trained using the Fill-in-the-Middle objective on 1 trillion tokens. HuggingFace Github
Starcoder2 BigCode 3B 7B 15B StarCoder2-15B model is a 15B parameter model trained on 600+ programming languages from The Stack v2, with opt-out requests excluded. The model uses Grouped Query Attention, a context window of 16,384 tokens with a sliding window attention of 4,096 tokens, and was trained using the Fill-in-the-Middle objective on 4+ trillion tokens. HuggingFace Kaggle GitHub
DeciCoder DeciAI 1B 6B DeciCoder are decoder-only code completion models trained on the Python, Java, and Javascript subsets of Starcoder Training Dataset. The model uses Grouped Query Attention and has a context window of 2048 tokens. It was trained using a Fill-in-the-Middle training objective. HuggingFace
Stable Code Stability AI 3B stable-code is a 2.7B billion parameter decoder-only language model pre-trained on 1.3 trillion tokens of diverse textual and code datasets. stable-code-3b is trained on 18 programming languages and demonstrates state-of-the-art performance (compared to models of similar size) on the MultiPL-E metrics across multiple programming languages tested using BigCode's Evaluation Harness. HuggingFace Github
SqlCoder DefogAI 7B 15B 34B 70B Defog's SQLCoder is a state-of-the-art LLM for converting natural language questions to SQL queries. It slightly outperforms gpt-3.5-turbo for natural language to SQL generation tasks on our sql-eval framework, and significantly outperforms all popular open-source models. It also significantly outperforms text-davinci-003, a model that's more than 10 times its size. HuggingFace Github
DeepSeek Coder Deepseek AI 1B to 33B Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. Various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks. HuggingFace GitHub
Codegen2 Salesforce 1B 3.7B 7B 16B CodeGen2 is a family of autoregressive language models for program synthesis. CodeGen2 was trained using cross-entropy loss to maximize the likelihood of sequential inputs. The input sequences are formatted in two ways: (1) causal language modeling and (2) file-level span corruption. HuggingFace Github
Codegen2.5 Salesforce 7B CodeGen2.5 is a family of autoregressive language models for program synthesis. Building upon CodeGen2, the model is trained on StarCoderData for 1.4T tokens, achieving competitive results compared to StarCoderBase-15.5B with less than half the size. There are 3 models:- CodeGen2.5-7B-multi, CodeGen2.5-7B-mono, CodeGen2.5-7B-instruct HuggingFace GitHub
Codet5+ Salesforce 110M 220M 770M 2B 6B 16B CodeT5+ is a family of open code large language models with an encoder-decoder architecture that can flexibly operate in different modes (i.e. encoder-only, decoder-only, and encoder-decoder) to support a wide range of code understanding and generation tasks. CodeT5+ models of below billion-parameter sizes significantly outperform many LLMs of up to 137B parameters. HuggingFace GitHub
Starchat2 Hugging Face 15B StarChat2 is the latest model in the series, and is a fine-tuned version of StarCoder2 that was trained with SFT and DPO on a mix of synthetic datasets. This model was trained to balance chat and programming capabilities. It achieves strong performance on chat benchmarks like MT Bench and IFEval, as well as the canonical HumanEval benchmark for Python code completion. HuggingFace
CrystalCoder LLM360 7B CrystalCoder is a 7B parameter language model, distinctively trained on the SlimPajama and StarCoder datasets. This model excels in balancing natural language processing and coding capabilities. HuggingFace

Image Generation

Model Created By Description Link
Stable Diffusion 2 Stability AI It is a Diffusion-based text-to-image generation model. This model can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (OpenCLIP-ViT/H). HuggingFace Kaggle Github
SDXL Turbo Stability AI SDXL-Turbo is a fast generative text-to-image model that can synthesize photorealistic images from a text prompt in a single network evaluation. SDXL-Turbo is a distilled version of SDXL 1.0, trained for real-time synthesis. SDXL-Turbo is based on a novel training method called Adversarial Diffusion Distillation (ADD). HuggingFace
Stable Cascade Stability AI Stable Cascade is a diffusion model trained to generate images given a text prompt. It is built upon the Würstchen architecture and its main difference to other models like Stable Diffusion is that it is working at a much smaller latent space. HuggingFace Github
DeciDiffusion v2.0 DeciAI DeciDiffusion 2.0 is a 732 million parameter text-to-image latent diffusion model. It is a state-of-the-art diffusion-based text-to-image generation model, builds upon the core architecture of Stable Diffusion. It incorporates key elements like the Variational Autoencoder (VAE) and the pre-trained Text Encoder CLIP. HuggingFace
Playground v2.5 Playground AI Playground v2.5 is a diffusion-based text-to-image generative model. It is the state-of-the-art open-source model in aesthetic quality. HuggingFace
SDXL-Lightning Bytedance SDXL-Lightning is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in a few steps. HuggingFace
DreamShaper Lykon DreamShaper is a versatile AI model developed for various creative tasks. It is a Stable Diffusion model that has been fine-tuned for creating better images. HuggingFace
Open Journey PromptHero Stable Diffusion fine tuned model on Midjourney images. HuggingFace
Dalle Mini Borisdayma Dalle mini is a transformer-based text-to-image generation model. This model can be used to generate images based on text prompts. HuggingFace Github

Speech and Audio Models

Model Created By Description Link
Whisper (STT) OpenAI Whisper is a Transformer based encoder-decoder model, also referred to as a sequence-to-sequence model. It is available in different sizes:- tiny, base, small, medium, large, large-v2, large-v3. It was trained on 1 million hours of weakly labeled audio and 4 million hours of pseudolabeled audio collected using Whisper large-v2. HuggingFace Github
Distil-whisper (STT) Hugging Face Distil Whisper is the knowledge distilled version of OpenAI's Whisper. It is available in different sizes:- distil-small, distil-medium, distil-large-v2, distil-large-v3. HuggingFace Github
Metavoice (TTS) MetaVoice MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech). HuggingFace Github
SpeechT5 (TTS) Microsoft SpeechT5 model fine-tuned for speech synthesis (text-to-speech). The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. After preprocessing the input speech/text through the pre-nets, the shared encoder-decoder network models the sequence-to-sequence transformation, and then the post-nets generate the output in the speech/text modality based on the output of the decoder. HuggingFace Github Blog
Magnet (Text to Music) Meta AI MAGNeT is a text-to-music and text-to-sound model capable of generating high-quality audio samples conditioned on text descriptions. It is a masked generative non-autoregressive Transformer trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Various Sizes of MAGNeT:- audio-magnet-small, audio-magnet-medium, magnet-small-10secs, magnet-small-30secs, magnet-medium-10secs, magnet-medium-3secs HuggingFace GitHub
Musicgen (Text to Music) Meta AI MusicGen is a text-to-music model capable of generating high-quality music samples conditioned on text descriptions or audio prompts. It is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Various Sizes of Musicgen:- musicgen-small, musicgen-medium, musicgen-melody, musicgen-large, musicgen-melody-large, musicgen-stereo HuggingFace Kaggle Github
Bark Suno AI Bark is a transformer-based text-to-audio model. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. This model is meant for research purposes only. Sizes:- bark and bark-small HuggingFace Bark HuggingFace Bark-Small Github
XTTS v2 Coqui AI XTTS is a Voice generation model that lets you clone voices into different languages by using just a quick 6-second audio clip. It supports 17 languages. HuggingFace Github

Video Generation Models

Model Created By Description Link
Stable Video Diffusion Stability AI Stable Video Diffusion (SVD) Image-to-Video is a diffusion model that takes in a still image as a conditioning frame, and generates a video from it. It is a latent diffusion model trained to generate short video clips from an image conditioning. This model was trained to generate 14 frames at resolution 576x1024 given a context frame of the same size. Hugging Face
AnimateDiff Lightening Bytedance AnimateDiff-Lightning is a lightning-fast text-to-video generation model. It can generate videos more than ten times faster than the original AnimateDiff. AnimateDiff-Lightning produces the best results when used with stylized base models. Hugging Face



Back to Top


Closed Source (Proprietary) Models


Text Generation

Model Created By Link
GPT 4 OpenAI GPT4
GPT 3.5 OpenAI GPT3.5
Gemini 1.5 Google Gemini Blog
Gemini 1.0 Google Gemini Blog
Claude 3 Anthropic Claude Blog
Claude 2.1 Anthropic Claude Blog
Grok 1.5 xAI Grok 1.5
Mistral Large Mistral AI Mistral Blog
Mistral Medium Mistral AI Mistral
Palm 2 Google Palm2
Jurassic2 A121 labs Blog
Titan AWS Titan
Granite IBM Granite
Infection 2.5 Infection AI Blog

Image Generation

Model Created By Link
Imagen 2 Google Imagen
Dalle 3 OpenAI Dalle3
Dalle 2 OpenAI Dalle2
Firefly 2 Adobe Firefly
Midjourney v6, v5 Midjourney Midjourney
Titan Image Generator AWS Titan
Ideogram 1.0 Ideogram Ideogram 1.0
Emu Edit Meta AI Emu Edit Blog

Video Generation Models

Model Created By Link
Sora OpenAI Sora Blog
Runwayml Gen2 Runwayml Gen2
Runwayml Gen1 Runwayml Gen1
Emu Video Meta AI Emu Video Blog

Speech and Audio Models

Model Created By Link
Turbo v2 Elevenlabs Blog
Suno v3 (Text to Music) Suno Blog
Voicebox Meta AI Voicebox Blog
Audiobox Meta AI Audiobox Blog



Back to Top