Welcome to our exploratory overview of the AI language model landscape. This document is intended as a collaborative resource, offering a broad comparison of large-scale language models developed by leading technology firms.
Our goal is to present a centralized, continually updated comparison, primarily focusing on the major models from entities like OpenAI's GPT series to Google's contributions, among others. We aim to assist researchers, developers, and enthusiasts in gaining a general understanding of the capabilities, structures, and distinct characteristics of each major model. It's important to note that this comparison will not delve into the finer details or specific versions of each model, but rather highlight the overarching features and developments in the field.
We explore each company's AI language models, outlining key specs like model size, architecture, training data, output capacity, context window, and type (open or closed source), along with release dates. This guide aims to clear up any confusion regarding model specifications, providing a straightforward comparison to navigate through the complexity of these models.
In the fast-paced domain of AI, information can quickly become outdated. To maintain the integrity of this repository, we rigorously source and verify any uncertain data before inclusion. Every piece of information is accompanied by a citation linking to its source, ensuring transparency and reliability.
Your insights and updates are crucial to the resource's vitality. We invite you to contribute by submitting pull requests with verifiable information or by flagging any discrepancies you may encounter. Open an issue to discuss potential changes or to seek clarification on existing data.
Company | Model | Parameters Size | Structure | Training Data (Date) | Output Token | context window | Type | Release Date |
---|---|---|---|---|---|---|---|---|
OpenAI | GPT-3.5 | 20B | Transformer | 45T tokens(~2021/09) | 4,096 | 4k~16k | close | 2022/3/15 |
GPT-4 | 8*220B (Plus, Team, Enterprise) | MoE (8 experts) | ? | 4,096 | 32k (Plus, Team) 128K(Enterprise) | close | 2023/4/4 | |
GPT-4o | over 175B | close | 2024/5/13 | |||||
PaLM 2 | 340B (Unicorn, Bison, Otter, Gecko) | Transformer | 3.6T tokens(~2023/8) | 1024~8192 | 8K~32K | close | 2023/5/10 | |
Gemma | 2B, 7B | Transformer | 2T, 6T tokens | ? | 8K | open source | 2024/2/21 | |
Gemma 2 | 9B, 27B | ? | 8T, 13T tokens | ? | 8,192 | open source | 2024/06/27 | |
Gemini 1.0 | Nano1(1.8B), Nano2(3.3B), Pro, Ultra | Transformer | ? | 2,048 | 8K | close | 2024/2/8 | |
Gemini 1.5 | Pro | MoE | ? | 8,192 | 128k~1m | close | 2024/2/15 | |
Anthropic | Claude 2.1 | 200B | Evolved Transformer | ? (~2023/Early) | 4,096 | 200K | close | 2023/11/21 |
Claude 3 | 20B, 70B, 2T (Haiku, Sonnet, Opus ) | Sparse Transformer | 40T tokens(~2023/8) | 4,096 | 200K | close | 2024/3/4 | |
Meta | Llama2 | 7B, 13B, 70B | Transformer | 2T tokens | 4,096 | 4,096 | open source | 2023/7/18 |
Llama3 | 8B, 70B, 400B | Transformer | 15T tokens ~8B: 2023/3 ~70B: 2023/12 | 8,192 | 8,192 | open source | 2024/4/18 | |
Mistral AI | Mistral (Tiny) | 7B | Transformer | ? | ? | 32k | open source | 2023/9/27 |
Mixtral 8x7B (Small) | 45B(use 12B) | SMoE (8 experts) | ? | ? | 32k | open source | 2023/12/11 | |
Mistral | Medium, Large | ? | ? | ? | 32k | close | 2024/2/26 | |
Mistral 8x22B | 141B(use 39B) | SMoE (8 experts) | ? | ? | 64k | open source | 2024/4/10 | |
xAI | Grok-1 | 314B | MoE (8 experts) | ? | ? | 8k | open source | 2024/3/17 |
Apple | MM1 | 3B, 7B, 30B | MoE (3B:64 experts, 7B-32 experts) | ? (Have quantity) | ? | ? | close | 2024/3/18 |
OpenELM | 270M, 450M, 1B, 3B | Transformer | ? | ? | ? | open source | 2024/4/25 | |
Snowflake | Arctic | 480B(use 17B) | Dense-MoE (128 experts) | 3.5T | ? | 4k | open source | 2024/4/25 |
Cohere | Command R+ | 104B | Dense | ? | ? | 128K | open source | 2024/4/4 |
Microsoft | Phi-3 | Mini(3.8B), Small(7B), Medium(14B) | Dense | 3.3T | ? | 4K, 128K | open source | 2024/4/24 |
Phi-3-vision | 4.2B | 500B | ? | 128K | open source | 2024/5/21 |
-
GPT-3.5
- Parameters Size: 20B?
- Structure: Transformer
- Training Data (Date): 45T (~2021/09)?
- Output Token: 4,096
- Context Window: 4k~16k
- Type: close
- Release Date: 2022/3/15
-
GPT-4
- Parameters Size: 8*220B? (Plus, Team, Enterprise)
- Structure: MoE (8 experts)
- Training Data (Date): ?
- Output Token: 4,096
- Context Window: 32k (Plus, Team) 128K(Enterprise)
- Type: close
- Release Date: 2023/4/4
-
PaLM 2
- Parameters Size: 340B? (Unicorn, Bison, Otter, Gecko)
- Structure: Transformer
- Training Data (Date): 3.6T (~2023/8)?
- Output Token: 1024~8192
- Context Window: 8K~32K
- Type: close
- Release Date: 2023/5/10
-
Gemma
- Parameters Size: 2B, 7B
- Structure: Transformer
- Training Data (Date): 2T, 6T tokens
- Output Token: ?
- Context Window: 8K
- Type: open source
- Release Date: 2024/2/21
-
Gemini 1.0
- Parameters Size: Nano1(1.8B), Nano2(3.3B), Pro, Ultra
- Structure: Transformer
- Training Data (Date): ?
- Output Token: 2,048
- Context Window: 8K
- Type: close
- Release Date: 2024/2/8
-
Gemini 1.5
- Parameters Size: Pro
- Structure: MoE
- Training Data (Date): ?
- Output Token: 8,192
- Context Window: 128k~1m
- Type: close
- Release Date: 2024/2/15
-
Claude 2.1
- Parameters Size: 200B?
- Structure: Evolved Transformer
- Training Data (Date): ? (~2023/Early)
- Output Token: 4,096
- Context Window: 200K
- Type: close
- Release Date: 2023/11/21
-
Claude 3
- Parameters Size: 20B, 70B, 2T (Haiku, Sonnet, Opus)?
- Structure: Sparse Transformer
- Training Data (Date): 40T tokens(~2023/8)?
- Output Token: 4,096
- Context Window: 200K
- Type: close
- Release Date: 2024/3/4
- Llama 2
- Parameters Size: 7B, 13B, 70B
- Structure: Transformer
- Training Data (Date): 2T tokens
- Output Token: 4,096?
- Context Window: 4,096
- Type: open source
- Release Date: 2023/7/18
- Llama 3
- Parameters Size: 8B, 70B, 400B
- Structure: Transformer
- Training Data (Date): 15T tokens (8B: ~2023/3, 70B: ~2023/12)
- Output Token: 8,192
- Context Window: 8,192
- Type: open source
- Release Date: 2024/4/18
-
Mistral (Tiny)
- Parameters Size: 7B
- Structure: Transformer
- Training Data (Date): ?
- Output Token: ?
- Context Window: 32k
- Type: open source
- Release Date: 2023/9/27
-
Mixtral 8x7B (Small)
- Parameters Size: 45B(use 12B)
- Structure: SMoE (8 experts)
- Training Data (Date): ?
- Output Token: ?
- Context Window: 32k
- Mistral news
- Type: open source
- Release Date: 2023/12/11
-
Mistral
- Parameters Size: Medium, Large
- Structure: ?
- Training Data (Date): ?
- Output Token: ?
- context window: 32k
- Type: close
- Release Date: 2024/2/26
-
Mixtral 8x22B
- Parameters Size: 141B (use 39B)
- Structure: SMoE (8 experts)
- Training Data (Date): ?
- Output Token: ?
- context window: 64k
- Type: open source
- Release Date: 2024/4/10
- Grok-1
- Parameters Size: 314B(use 79B)
- Structure: MoE (8 experts)
- Training Data (Date): ?
- Output Token: ?
- Context Window: 8k
- Type: open source
- Release Date: 2024/3/17
- MM1
- Parameters Size: 3B, 7B, 30B
- Structure: MoE (3B:64 experts, 7B-32 experts)
- Training Data (Date): ?
- Captioned Images: 2B image-text pairs
- Captioned Images (Synthetic): 300M image-text pairs
- Interleaved Image-Text: 600M documents
- Text-only: 2T tokens
- Output Token: ?
- Context Window: ?
- Type: close
- Release Date: 2024/3/18
- Arctic
- Parameters Size: 480B (use 17B)
- Structure: Dense-MoE (128 experts)
- Training Data (Date): 3.5T
- Output Token: ?
- Context Window: 4K
- Type: open source
- Release Date: 2024/4/25
- Command R+
- Parameters Size: 104B
- Structure: Dense
- Training Data (Date): ?
- Output Token: ?
- Context Window: 128K
- Type: open source
- Release Date: 2024/4/4
For a detailed comparison of various Large Language Models (LLMs) in terms of their specifications and performance scores, see the following resource: