Generative AI Models

Open Source Models
Closed Source Models

Open Source Models

Text Generation

Model	Created By	Size	Description	Link
Arctic (Dense-MoE)	Snowflake	480B Active 17B	Arctic is a dense-MoE Hybrid transformer architecture pre-trained from scratch. Arctic combines a 10B dense transformer model with a residual 128x3.66B MoE MLP resulting in 480B total and 17B active parameters chosen using a top-2 gating.	HuggingFace Github Blog
LLama 3	Meta AI	8B 70B	Llama 3 is a family of large language models, a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. It is an auto-regressive language model that uses an optimizehttps://github.com/Snowflake-Labs/snowflake-arcticd transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF).	HuggingFace Blog Github
Phi 3	Microsoft	3.8B	Phi-3-Mini is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. This dataset includes both synthetic data and publicly available website data, with an emphasis on high-quality and reasoning-dense properties. Phi-3 models are the most capable and cost-effective small language models (SLMs) available,	HuggingFace Blog
OpenELM	Apple	270M 450M 1.1B 3B	OpenELM, a family of Open-source Efficient Language Models. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. Trained on RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens. Released both pretrained and instruction tuned models with 270M, 450M, 1.1B and 3B parameters.	HuggingFace OpenELM HuggingFace OpenELM-Instruct
Mixtral 8x22B (MoE)	Mistral AI	176B Active 40B	Mixtral-8x22B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. It has contect length of 65,000 tokens.	HuggingFace Blog
Command-R+	Cohere	104B	C4AI Command R+ is an open weights research release of a 104B billion parameter model with highly advanced capabilities, this includes Retrieval Augmented Generation (RAG) and tool use to automate sophisticated tasks. Command R+ is optimized for a variety of use cases including reasoning, summarization, and question answering.	Hugging Face
Jamba (MoE)	AI21 labs	52B active 12B	Jamba is a state-of-the-art, hybrid SSM-Transformer LLM. It delivers throughput gains over traditional Transformer-based models. It’s a pretrained, mixture-of-experts (MoE) generative text model, with 12B active parameters and a total of 52B parameters across all experts. It supports a 256K context length, and can fit up to 140K tokens on a single 80GB GPU.	HuggingFace Blog
DBRX (MoE)	Databricks	132B Active 36B	DBRX is a transformer-based decoder-only large language model (LLM) that was trained using next-token prediction. It uses a fine-grained mixture-of-experts (MoE) architecture with 132B total parameters of which 36B parameters are active on any input. It was pre-trained on 12T tokens of text and code data. Compared to other open MoE models like Mixtral-8x7B and Grok-1, DBRX is fine-grained, meaning it uses a larger number of smaller experts. DBRX has 16 experts and chooses 4, while Mixtral-8x7B and Grok-1 have 8 experts and choose 2. This provides 65x more possible combinations of experts which improves model quality.	HuggingFace Github Blog
Grok 1.0 (MoE)	xAI	314B	Grok 1.0 uses Mixture of 8 Experts (MoE). Grok 1.0 is not fine-tuned for specific applications like dialogue but showcases strong performance compared to other models like GPT-3.5 and Llama 2. It is larger than GPT-3/3.5.	Github HuggingFace
Gemma	Google	2B 7B	Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning.	HuggingFace Kaggle Github Blog
Recurrent Gemma	Google	2B	RecurrentGemma is a family of open language models built on a novel recurrent architecture. Like Gemma, RecurrentGemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Because of its novel architecture, RecurrentGemma requires less memory than Gemma and achieves faster inference when generating long sequences.	HuggingFace Kaggle
Mixtral 8x7B (MoE)	Mistral AI	45B Active 12B	Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks.	HuggingFace Kaggle Blog
Qwen1.5-MoE (MoE)	Alibaba	14.3B Active 2.7B	Qwen1.5-MoE is a transformer-based MoE decoder-only language model pretrained on a large amount of data. It employs Mixture of Experts (MoE) architecture, where the models are upcycled from dense language models. It has 14.3B parameters in total and 2.7B activated parameters during runtime, while achieching comparable performance to Qwen1.5-7B, it only requires 25% of the training resources.	HuggingFace
Mistral 7B	Mistral AI	7B	The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mistral-7B-v0.1 outperforms Llama 2 13B on most benchmarks.	Github HuggingFace Kaggle Blog
Mistral 7B v2	Mistral AI	7B	Mistral 7B v2 has the following changes compared to Mistral 7B:- 32k context window (vs 8k context in v0.1), Rope-theta = 1e6, No Sliding-Window Attention.	HuggingFace
Llama 2	Meta AI	7B 13B 70B	Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. It is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety.	HuggingFace Kaggle Github Blog
Dolly v2	Databricks	3B 7B 12B	Dolly v2 is a causal language model created by Databricks that is derived from EleutherAI's Pythia-12b and fine-tuned on a ~15K record instruction corpus.	HuggingFace Dolly3B HuggingFace Dolly7B HuggingFace Dolly12B Kaggle Github
Command-R	Cohere	35B	Command-R is a research release of a 35 billion parameter highly performant generative model. Command-R is a large language model with open weights optimized for a variety of use cases including reasoning, summarization, and question answering. Command-R has the capability for multilingual generation evaluated in 10 languages and highly performant RAG capabilities.	HuggingFace Kaggle
Qwen1.5	Alibaba	0.5B 1.8B 4B 7B 14B 32B 72B	Qwen1.5 is a transformer-based decoder-only language model pretrained on a large amount of data. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention, etc.	HuggingFace Github
Vicuna v1.5	Lysms	7B 13B	Vicuna v1.5 is fine-tuned from Llama 2 with supervised instruction fine-tuning. The training data is around 125K conversations collected from ShareGPT.com. The primary use of Vicuna is research on large language models and chatbots.	HuggingFace Vicuna7B HuggingFace Vicuna13B
Phi 2	Microsoft	2.7B	Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites. When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-2 showcased a nearly state-of-the-art performance among models with less than 13 billion parameters.	HuggingFace Kaggle Blog
Orca 2	Microsoft	7B 13B	Orca 2 is built for research purposes only and provides a single turn response in tasks such as reasoning over user given data, reading comprehension, math problem solving and text summarization. The model is designed to excel particularly in reasoning. The model is not optimized for chat and has not been trained with RLHF or DPO.	HuggingFace Blog
Smaug	Abacus AI	34B 72B	Smaug is created using a new fine-tuning technique, DPO-Positive (DPOP), and new pairwise preference versions of ARC, HellaSwag, and MetaMath (as well as other existing datasets).	HuggingFace
MPT	Mosaicml	1B 7B 30B	MPT is a decoder-style transformer pretrained from scratch on 1T tokens of English text and code. These models use a modified transformer architecture optimized for efficient training and inference. These architectural changes include performance-optimized layer implementations and the elimination of context length limits by replacing positional embeddings with Attention with Linear Biases (ALiBi).	HuggingFace Kaggle Github
Falcon	TLL	7B 40B 180B	Falcon is a 7B/40B/180B parameters causal decoder-only models built by TII and trained on 1,000B/1,500B/3,500B tokens of RefinedWeb enhanced with curated corpora.	HuggingFace
Yalm	Yandex	100B	YaLM 100B is a GPT-like neural network for generating and processing text. It is trained on a cluster of 800 A100 graphics cards over 65 days. It is designed for text generation and processing.	HuggingFace Github
DeciLM	DeciAI	6B 7B	DeciLM is a decoder-only text generation model. With support for an 8K-token sequence length, this highly efficient model uses variable Grouped-Query Attention (GQA) to achieve a superior balance between accuracy and computational efficiency.	HuggingFace
BERT	Google	110M to 350M	BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labeling them in any way with an automatic process to generate inputs and labels from those texts.	HuggingFace Kaggle GitHub
Olmo	AllenAI	1B 7B	OLMo is a series of Open Language Models designed to enable the science of language models. The OLMo models are trained on the Dolma dataset.	HuggingFace Github
Openchat3.5	Openchat	7B	Openchat2.5 is the best performing 7B LLM.	HuggingFace Github
Bloom	BigScience	176B	BLOOM is an autoregressive Large Language Model (LLM), trained to continue text from a prompt on vast amounts of text data using industrial-scale computational resources.	HuggingFace
Hermes 2 Pro Mistral	Nous Research	7B	Hermes 2 Pro on Mistral 7B is the new flagship 7B Hermes. Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house. This new version of Hermes maintains its excellent general task and conversation capabilities - but also excels at Function Calling, JSON Structured Outputs.	HuggingFace
Hermes 2 Mixtral 7x8B (MoE)	Nous Research	Active 12B	Nous Hermes 2 Mixtral 8x7B DPO is the new flagship Nous Research model trained over the Mixtral 8x7B MoE LLM. The model was trained on over 1,000,000 entries of primarily GPT-4 generated data, as well as other high quality data from open datasets across the AI landscape, achieving state of the art performance on a variety of tasks. This is the SFT + DPO version of Mixtral Hermes 2.	HuggingFace
Merlinite	IBM	7B	Merlinite-7b is a Mistral-7b-derivative model trained with the LAB methodology, using Mixtral-8x7b-Instruct as a teacher model.	HuggingFace
Labradorite	IBM	13B	Labradorite-13b is a LLaMA-2-13b-derivative model trained with the LAB methodology, using Mixtral-8x7b-Instruct as a teacher model.	HuggingFace
Xgen	Salesforce	7B	Xgen is a Large Language Model that have a context length of 8K, 4K and are optimised for long sequence tasks.	HuggingFace Github
Solar	Upstage	10.7B	SOLAR-10.7B, an advanced large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. It's compact, yet remarkably powerful, and demonstrates unparalleled state-of-the-art performance in models with parameters under 30B.	HuggingFace
GPT-Neox	Eleuther AI	20B	GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile using the GPT-NeoX library. Its architecture intentionally resembles that of GPT-3, and is almost identical to that of GPT-J-6B.	HuggingFace GitHub
Flan-T5	Google	80M to 11B	FLAN-T5 is modified version of T5 and has same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages. Various Sizes:- flan-t5-small, flan-t5-base, flan-t5-large, flan-t5-xxl	HuggingFace Kaggle
OPT	Meta AI	125M to 175B	OPT are decoder-only pre-trained transformers ranging from 125M to 175B parameters. It was predominantly pretrained with English text but a small amount of non-English data is still present within the training corpus via CommonCrawl.	HuggingFace
Stable LM 2	Stability AI	1.6B 12B	Stable LM 2 are decoder-only language models pre-trained on 2 trillion tokens of diverse multilingual and code datasets for two epochs.	HuggingFace
Stable LM Zephyr	Stability AI	3B	StableLM Zephyr 3B model is an auto-regressive language model based on the transformer decoder architecture. StableLM Zephyr 3B is a 3 billion parameter that was trained on a mix of publicly available datasets and synthetic datasets using Direct Preference Optimization (DPO).	HuggingFace
Aya	Cohere	13B	The Aya model is a transformer style autoregressive massively multilingual generative language model that follows instructions in 101 languages. It has same architecture as mt5-xxl.	HuggingFace Kaggle Blog
Nemotron 3	Nvidia	8B	Nemotron-3 are large language foundation models for enterprises to build custom LLMs. This foundation model has 8 billion parameters, and supports a context length of 4,096 tokens. Nemotron-3 is a family of enterprise ready generative text models compatible with NVIDIA NeMo Framework.	HuggingFace
Neural Chat v3	Intel	7B	Neural Chat is a fine-tuned 7B parameter LLM on the Intel Gaudi 2 processor from the mistralai/Mistral-7B-v0.1 on the open source dataset Open-Orca/SlimOrca. The model was aligned using the Direct Performance Optimization (DPO) method.	HuggingFace
Yi	01 AI	6B 9B 34B	The Yi series models are the next generation of open-source large language models. They are targeted as a bilingual language model and trained on 3T multilingual corpus, showing promise in language understanding, commonsense reasoning, reading comprehension, and more.	HuggingFace Github
Starling LM	Nexusflow	7B	Starling LM, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling LM is trained from Openchat-3.5-0106 with our new reward model Starling-RM-34B and policy optimization method Fine-Tuning Language Models from Human Preferences (PPO).	HuggingFace
NexusRaven v2	Nexusflow	13B	NexusRaven is an open-source and commercially viable function calling LLM that surpasses the state-of-the-art in function calling capabilities. NexusRaven-V2 is capable of generating deeply nested function calls, parallel function calls, and simple single calls. It can also justify the function calls it generated.	HuggingFace
DeepSeek LLM	Deepseek AI	7B 67B	DeepSeek LLM is an advanced language model. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese.	HuggingFace Github
Deepseek VL (Multimodal)	Deepseek AI	1.3B 7B	DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios. It is a hybrid vision encoder supporting 1024 x 1024 image input and is constructed based on the DeepSeek-7b-base which is trained on an approximate corpus of 2T text tokens.	HuggingFace Github
Llava 1.6 (Multimodal)	Llava HF	7B 13B 34B	LLaVa combines a pre-trained large language model with a pre-trained vision encoder for multimodal chatbot use cases. Available models:- Llava-v1.6-34b-hf, Llava-v1.6-Mistral-7b-hf, Llava-v1.6-Vicuna-7b-hf, Llava-v1.6-vicuna-13b-hf	Hugging Face HuggingFace
Yi VL (Multimodal)	01 AI	6B 34B	Yi-VL model is the open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images.	HuggingFace YiVL6B HuggingFace YiVL34B

Code Generation

Model	Created By	Size	Description	Link
CodeQwen1.5	Alibaba	7B	CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes and have a context length of 64K tokens. Supporting 92 coding languages. Excellent performance in text-to-SQL, bug fix etc.	HuggingFace
CodeGemma	Google	2B 7B	CodeGemma is a collection of lightweight open code models built on top of Gemma. CodeGemma models are text-to-text and text-to-code decoder-only models. CodeGemma 2B and 7B are further trained on an additional 500 billion tokens of primarily English language data from publicly available code repositories, open source mathematics datasets and synthetically generated code.	HuggingFace Kaggle
CodeLlama	Meta AI	7B 13B 34B 70B	Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters.	HuggingFace Kaggle Github
Starcoder	BigCode	15.5B	The StarCoder models are 15.5B parameter models trained on 80+ programming languages from The Stack (v1.2), with opt-out requests excluded. The model uses Multi Query Attention, a context window of 8192 tokens, and was trained using the Fill-in-the-Middle objective on 1 trillion tokens.	HuggingFace Github
Starcoder2	BigCode	3B 7B 15B	StarCoder2-15B model is a 15B parameter model trained on 600+ programming languages from The Stack v2, with opt-out requests excluded. The model uses Grouped Query Attention, a context window of 16,384 tokens with a sliding window attention of 4,096 tokens, and was trained using the Fill-in-the-Middle objective on 4+ trillion tokens.	HuggingFace Kaggle GitHub
DeciCoder	DeciAI	1B 6B	DeciCoder are decoder-only code completion models trained on the Python, Java, and Javascript subsets of Starcoder Training Dataset. The model uses Grouped Query Attention and has a context window of 2048 tokens. It was trained using a Fill-in-the-Middle training objective.	HuggingFace
Stable Code	Stability AI	3B	stable-code is a 2.7B billion parameter decoder-only language model pre-trained on 1.3 trillion tokens of diverse textual and code datasets. stable-code-3b is trained on 18 programming languages and demonstrates state-of-the-art performance (compared to models of similar size) on the MultiPL-E metrics across multiple programming languages tested using BigCode's Evaluation Harness.	HuggingFace Github
SqlCoder	DefogAI	7B 15B 34B 70B	Defog's SQLCoder is a state-of-the-art LLM for converting natural language questions to SQL queries. It slightly outperforms gpt-3.5-turbo for natural language to SQL generation tasks on our sql-eval framework, and significantly outperforms all popular open-source models. It also significantly outperforms text-davinci-003, a model that's more than 10 times its size.	HuggingFace Github
DeepSeek Coder	Deepseek AI	1B to 33B	Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. Various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.	HuggingFace GitHub
Codegen2	Salesforce	1B 3.7B 7B 16B	CodeGen2 is a family of autoregressive language models for program synthesis. CodeGen2 was trained using cross-entropy loss to maximize the likelihood of sequential inputs. The input sequences are formatted in two ways: (1) causal language modeling and (2) file-level span corruption.	HuggingFace Github
Codegen2.5	Salesforce	7B	CodeGen2.5 is a family of autoregressive language models for program synthesis. Building upon CodeGen2, the model is trained on StarCoderData for 1.4T tokens, achieving competitive results compared to StarCoderBase-15.5B with less than half the size. There are 3 models:- CodeGen2.5-7B-multi, CodeGen2.5-7B-mono, CodeGen2.5-7B-instruct	HuggingFace GitHub
Codet5+	Salesforce	110M 220M 770M 2B 6B 16B	CodeT5+ is a family of open code large language models with an encoder-decoder architecture that can flexibly operate in different modes (i.e. encoder-only, decoder-only, and encoder-decoder) to support a wide range of code understanding and generation tasks. CodeT5+ models of below billion-parameter sizes significantly outperform many LLMs of up to 137B parameters.	HuggingFace GitHub
Starchat2	Hugging Face	15B	StarChat2 is the latest model in the series, and is a fine-tuned version of StarCoder2 that was trained with SFT and DPO on a mix of synthetic datasets. This model was trained to balance chat and programming capabilities. It achieves strong performance on chat benchmarks like MT Bench and IFEval, as well as the canonical HumanEval benchmark for Python code completion.	HuggingFace
CrystalCoder	LLM360	7B	CrystalCoder is a 7B parameter language model, distinctively trained on the SlimPajama and StarCoder datasets. This model excels in balancing natural language processing and coding capabilities.	HuggingFace

Image Generation

Model	Created By	Description	Link
Stable Diffusion 2	Stability AI	It is a Diffusion-based text-to-image generation model. This model can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (OpenCLIP-ViT/H).	HuggingFace Kaggle Github
SDXL Turbo	Stability AI	SDXL-Turbo is a fast generative text-to-image model that can synthesize photorealistic images from a text prompt in a single network evaluation. SDXL-Turbo is a distilled version of SDXL 1.0, trained for real-time synthesis. SDXL-Turbo is based on a novel training method called Adversarial Diffusion Distillation (ADD).	HuggingFace
Stable Cascade	Stability AI	Stable Cascade is a diffusion model trained to generate images given a text prompt. It is built upon the Würstchen architecture and its main difference to other models like Stable Diffusion is that it is working at a much smaller latent space.	HuggingFace Github
DeciDiffusion v2.0	DeciAI	DeciDiffusion 2.0 is a 732 million parameter text-to-image latent diffusion model. It is a state-of-the-art diffusion-based text-to-image generation model, builds upon the core architecture of Stable Diffusion. It incorporates key elements like the Variational Autoencoder (VAE) and the pre-trained Text Encoder CLIP.	HuggingFace
Playground v2.5	Playground AI	Playground v2.5 is a diffusion-based text-to-image generative model. It is the state-of-the-art open-source model in aesthetic quality.	HuggingFace
SDXL-Lightning	Bytedance	SDXL-Lightning is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in a few steps.	HuggingFace
DreamShaper	Lykon	DreamShaper is a versatile AI model developed for various creative tasks. It is a Stable Diffusion model that has been fine-tuned for creating better images.	HuggingFace
Open Journey	PromptHero	Stable Diffusion fine tuned model on Midjourney images.	HuggingFace
Dalle Mini	Borisdayma	Dalle mini is a transformer-based text-to-image generation model. This model can be used to generate images based on text prompts.	HuggingFace Github

Speech and Audio Models

Model	Created By	Description	Link
Whisper (STT)	OpenAI	Whisper is a Transformer based encoder-decoder model, also referred to as a sequence-to-sequence model. It is available in different sizes:- tiny, base, small, medium, large, large-v2, large-v3. It was trained on 1 million hours of weakly labeled audio and 4 million hours of pseudolabeled audio collected using Whisper large-v2.	HuggingFace Github
Distil-whisper (STT)	Hugging Face	Distil Whisper is the knowledge distilled version of OpenAI's Whisper. It is available in different sizes:- distil-small, distil-medium, distil-large-v2, distil-large-v3.	HuggingFace Github
Metavoice (TTS)	MetaVoice	MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech).	HuggingFace Github
SpeechT5 (TTS)	Microsoft	SpeechT5 model fine-tuned for speech synthesis (text-to-speech). The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. After preprocessing the input speech/text through the pre-nets, the shared encoder-decoder network models the sequence-to-sequence transformation, and then the post-nets generate the output in the speech/text modality based on the output of the decoder.	HuggingFace Github Blog
Magnet (Text to Music)	Meta AI	MAGNeT is a text-to-music and text-to-sound model capable of generating high-quality audio samples conditioned on text descriptions. It is a masked generative non-autoregressive Transformer trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Various Sizes of MAGNeT:- audio-magnet-small, audio-magnet-medium, magnet-small-10secs, magnet-small-30secs, magnet-medium-10secs, magnet-medium-3secs	HuggingFace GitHub
Musicgen (Text to Music)	Meta AI	MusicGen is a text-to-music model capable of generating high-quality music samples conditioned on text descriptions or audio prompts. It is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Various Sizes of Musicgen:- musicgen-small, musicgen-medium, musicgen-melody, musicgen-large, musicgen-melody-large, musicgen-stereo	HuggingFace Kaggle Github
Bark	Suno AI	Bark is a transformer-based text-to-audio model. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. This model is meant for research purposes only. Sizes:- bark and bark-small	HuggingFace Bark HuggingFace Bark-Small Github
XTTS v2	Coqui AI	XTTS is a Voice generation model that lets you clone voices into different languages by using just a quick 6-second audio clip. It supports 17 languages.	HuggingFace Github

Video Generation Models

Model	Created By	Description	Link
Stable Video Diffusion	Stability AI	Stable Video Diffusion (SVD) Image-to-Video is a diffusion model that takes in a still image as a conditioning frame, and generates a video from it. It is a latent diffusion model trained to generate short video clips from an image conditioning. This model was trained to generate 14 frames at resolution 576x1024 given a context frame of the same size.	Hugging Face
AnimateDiff Lightening	Bytedance	AnimateDiff-Lightning is a lightning-fast text-to-video generation model. It can generate videos more than ten times faster than the original AnimateDiff. AnimateDiff-Lightning produces the best results when used with stylized base models.	Hugging Face

Closed Source (Proprietary) Models

Text Generation

Model	Created By	Link
GPT 4	OpenAI	GPT4
GPT 3.5	OpenAI	GPT3.5
Gemini 1.5	Google	Gemini Blog
Gemini 1.0	Google	Gemini Blog
Claude 3	Anthropic	Claude Blog
Claude 2.1	Anthropic	Claude Blog
Grok 1.5	xAI	Grok 1.5
Mistral Large	Mistral AI	Mistral Blog
Mistral Medium	Mistral AI	Mistral
Palm 2	Google	Palm2
Jurassic2	A121 labs	Blog
Titan	AWS	Titan
Granite	IBM	Granite
Infection 2.5	Infection AI	Blog

Image Generation

Model	Created By	Link
Imagen 2	Google	Imagen
Dalle 3	OpenAI	Dalle3
Dalle 2	OpenAI	Dalle2
Firefly 2	Adobe	Firefly
Midjourney v6, v5	Midjourney	Midjourney
Titan Image Generator	AWS	Titan
Ideogram 1.0	Ideogram	Ideogram 1.0
Emu Edit	Meta AI	Emu Edit Blog

Video Generation Models

Model	Created By	Link
Sora	OpenAI	Sora Blog
Runwayml Gen2	Runwayml	Gen2
Runwayml Gen1	Runwayml	Gen1
Emu Video	Meta AI	Emu Video Blog

Speech and Audio Models

Model	Created By	Link
Turbo v2	Elevenlabs	Blog
Suno v3 (Text to Music)	Suno	Blog
Voicebox	Meta AI	Voicebox Blog
Audiobox	Meta AI	Audiobox Blog

prthm786/awesome-genai-models

Generative AI Models

Open Source Models

Text Generation

Code Generation

Image Generation

Speech and Audio Models

Video Generation Models

Closed Source (Proprietary) Models

Text Generation

Image Generation

Video Generation Models

Speech and Audio Models