Comparison of LLM Specifications

Introduction

Welcome to our exploratory overview of the AI language model landscape. This document is intended as a collaborative resource, offering a broad comparison of large-scale language models developed by leading technology firms.

Our goal is to present a centralized, continually updated comparison, primarily focusing on the major models from entities like OpenAI's GPT series to Google's contributions, among others. We aim to assist researchers, developers, and enthusiasts in gaining a general understanding of the capabilities, structures, and distinct characteristics of each major model. It's important to note that this comparison will not delve into the finer details or specific versions of each model, but rather highlight the overarching features and developments in the field.

Language Models

We explore each company's AI language models, outlining key specs like model size, architecture, training data, output capacity, context window, and type (open or closed source), along with release dates. This guide aims to clear up any confusion regarding model specifications, providing a straightforward comparison to navigate through the complexity of these models.

Data Integrity and Contributions

In the fast-paced domain of AI, information can quickly become outdated. To maintain the integrity of this repository, we rigorously source and verify any uncertain data before inclusion. Every piece of information is accompanied by a citation linking to its source, ensuring transparency and reliability.

Your insights and updates are crucial to the resource's vitality. We invite you to contribute by submitting pull requests with verifiable information or by flagging any discrepancies you may encounter. Open an issue to discuss potential changes or to seek clarification on existing data.

Table

Company	Model	Parameters Size	Structure	Training Data (Date)	Output Token	context window	Type	Release Date
OpenAI	GPT-3.5	20B	Transformer	45T tokens(~2021/09)	4,096	4k~16k	close	2022/3/15
	GPT-4	8*220B (Plus, Team, Enterprise)	MoE (8 experts)	?	4,096	32k (Plus, Team) 128K(Enterprise)	close	2023/4/4
	GPT-4o	over 175B					close	2024/5/13
Google	PaLM 2	340B (Unicorn, Bison, Otter, Gecko)	Transformer	3.6T tokens(~2023/8)	1024~8192	8K~32K	close	2023/5/10
	Gemma	2B, 7B	Transformer	2T, 6T tokens	?	8K	open source	2024/2/21
	Gemma 2	9B, 27B	?	8T, 13T tokens	?	8,192	open source	2024/06/27
	Gemini 1.0	Nano1(1.8B), Nano2(3.3B), Pro, Ultra	Transformer	?	2,048	8K	close	2024/2/8
	Gemini 1.5	Pro	MoE	?	8,192	128k~1m	close	2024/2/15
Anthropic	Claude 2.1	200B	Evolved Transformer	? (~2023/Early)	4,096	200K	close	2023/11/21
	Claude 3	20B, 70B, 2T (Haiku, Sonnet, Opus )	Sparse Transformer	40T tokens(~2023/8)	4,096	200K	close	2024/3/4
Meta	Llama2	7B, 13B, 70B	Transformer	2T tokens	4,096	4,096	open source	2023/7/18
	Llama3	8B, 70B, 400B	Transformer	15T tokens ~8B: 2023/3 ~70B: 2023/12	8,192	8,192	open source	2024/4/18
Mistral AI	Mistral (Tiny)	7B	Transformer	?	?	32k	open source	2023/9/27
	Mixtral 8x7B (Small)	45B(use 12B)	SMoE (8 experts)	?	?	32k	open source	2023/12/11
	Mistral	Medium, Large	?	?	?	32k	close	2024/2/26
	Mistral 8x22B	141B(use 39B)	SMoE (8 experts)	?	?	64k	open source	2024/4/10
xAI	Grok-1	314B	MoE (8 experts)	?	?	8k	open source	2024/3/17
Apple	MM1	3B, 7B, 30B	MoE (3B:64 experts, 7B-32 experts)	? (Have quantity)	?	?	close	2024/3/18
	OpenELM	270M, 450M, 1B, 3B	Transformer	?	?	?	open source	2024/4/25
Snowflake	Arctic	480B(use 17B)	Dense-MoE (128 experts)	3.5T	?	4k	open source	2024/4/25
Cohere	Command R+	104B	Dense	?	?	128K	open source	2024/4/4
Microsoft	Phi-3	Mini(3.8B), Small(7B), Medium(14B)	Dense	3.3T	?	4K, 128K	open source	2024/4/24
	Phi-3-vision	4.2B		500B	?	128K	open source	2024/5/21

Details and Rumor sources

OpenAI

GPT-3.5
- Parameters Size: 20B?
  - OpenAI Community
- Structure: Transformer
- Training Data (Date): 45T (~2021/09)?
  - OpenAI Community
- Output Token: 4,096
  - OpenAI Documentation
- Context Window: 4k~16k
  - OpenAI Documentation
- Type: close
- Release Date: 2022/3/15
GPT-4
- Parameters Size: 8*220B? (Plus, Team, Enterprise)
  - X-Soumith Chintala
- Structure: MoE (8 experts)
  - semianalysis
- Training Data (Date): ?
- Output Token: 4,096
  - OpenAI Documentation
- Context Window: 32k (Plus, Team) 128K(Enterprise)
  - OpenAI Documentation
- Type: close
- Release Date: 2023/4/4

Google

PaLM 2
- Parameters Size: 340B? (Unicorn, Bison, Otter, Gecko)
  - Sundar Pichai
- Structure: Transformer
- Training Data (Date): 3.6T (~2023/8)?
  - Sundar Pichai
- Output Token: 1024~8192
  - Google Documentation
- Context Window: 8K~32K
  - Google Documentation
- Type: close
- Release Date: 2023/5/10
Gemma
- Parameters Size: 2B, 7B
- Structure: Transformer
- Training Data (Date): 2T, 6T tokens
  - DeepMind Report
- Output Token: ?
- Context Window: 8K
  - Amazon Blog
- Type: open source
  - Hugging Face
- Release Date: 2024/2/21
Gemini 1.0
- Parameters Size: Nano1(1.8B), Nano2(3.3B), Pro, Ultra
- Structure: Transformer
- Training Data (Date): ?
- Output Token: 2,048
  - Google Documentation
- Context Window: 8K
  - Google Documentation
- Type: close
- Release Date: 2024/2/8
Gemini 1.5
- Parameters Size: Pro
- Structure: MoE
  - Google Blog
- Training Data (Date): ?
- Output Token: 8,192
  - Google Documentation
- Context Window: 128k~1m
  - Google Documentation
- Type: close
- Release Date: 2024/2/15

Anthropic

Claude 2.1
- Parameters Size: 200B?
  - claudeai
- Structure: Evolved Transformer
- Training Data (Date): ? (~2023/Early)
  - Anthropic Documentation
- Output Token: 4,096
  - Anthropic Documentation
- Context Window: 200K
  - Anthropic Documentation
- Type: close
- Release Date: 2023/11/21
Claude 3
- Parameters Size: 20B, 70B, 2T (Haiku, Sonnet, Opus)?
  - DR ALAN D. THOMPSON
- Structure: Sparse Transformer
- Training Data (Date): 40T tokens(~2023/8)?
  - DR ALAN D. THOMPSON
- Output Token: 4,096
  - Anthropic Documentation
- Context Window: 200K
  - Anthropic Documentation
- Type: close
- Release Date: 2024/3/4

Mistral AI

Mistral (Tiny)
- Parameters Size: 7B
- Structure: Transformer
- Training Data (Date): ?
- Output Token: ?
- Context Window: 32k
  - Hugging Face
- Type: open source
  - Hugging Face
- Release Date: 2023/9/27
Mixtral 8x7B (Small)
- Parameters Size: 45B(use 12B)
- Structure: SMoE (8 experts)
  - Mistral news
- Training Data (Date): ?
- Output Token: ?
- Context Window: 32k
  - Mistral news
- Type: open source
  - Hugging Face
- Release Date: 2023/12/11
Mistral
- Parameters Size: Medium, Large
- Structure: ?
- Training Data (Date): ?
- Output Token: ?
- context window: 32k
  - Mistral news
- Type: close
- Release Date: 2024/2/26
Mixtral 8x22B
- Parameters Size: 141B (use 39B)
- Structure: SMoE (8 experts)
- Training Data (Date): ?
- Output Token: ?
- context window: 64k
  - Mistral news
- Type: open source
- Release Date: 2024/4/10

xAI

Grok-1
- Parameters Size: 314B(use 79B)
- Structure: MoE (8 experts)
- Training Data (Date): ?
- Output Token: ?
- Context Window: 8k
- Type: open source
  - Github
- Release Date: 2024/3/17

Apple

MM1
- Parameters Size: 3B, 7B, 30B
- Structure: MoE (3B:64 experts, 7B-32 experts)
- Training Data (Date): ?
  - Captioned Images: 2B image-text pairs
  - Captioned Images (Synthetic): 300M image-text pairs
  - Interleaved Image-Text: 600M documents
  - Text-only: 2T tokens
- Output Token: ?
- Context Window: ?
- Type: close
- Release Date: 2024/3/18

Snowflake

Arctic
- Parameters Size: 480B (use 17B)
  - snowflake
- Structure: Dense-MoE (128 experts)
- Training Data (Date): 3.5T
- Output Token: ?
- Context Window: 4K
  - huggingface
- Type: open source
- Release Date: 2024/4/25

Cohere

Command R+
- Parameters Size: 104B
  - cohere
- Structure: Dense
- Training Data (Date): ?
- Output Token: ?
- Context Window: 128K
- Type: open source
  - huggingface
- Release Date: 2024/4/4

Other LLM Comparisons

Model Specifications and Performance Scores

For a detailed comparison of various Large Language Models (LLMs) in terms of their specifications and performance scores, see the following resource:

LLM Comparison/Test by Wolfram Ravenwolf: This comparison includes 17 new models, totaling 64 ranked models in the Gembo v0 series.
- Reddit
- Chart
Local LLM Comparison Colab UI: Compare the performance of different LLM that can be deployed locally on consumer hardware.
- Github

wsxqaza12/Comparison-of-LLM-Specifications