Comparison of LLM Specifications

Introduction

Welcome to our exploratory overview of the AI language model landscape. This document is intended as a collaborative resource, offering a broad comparison of large-scale language models developed by leading technology firms.

Our goal is to present a centralized, continually updated comparison, primarily focusing on the major models from entities like OpenAI's GPT series to Google's contributions, among others. We aim to assist researchers, developers, and enthusiasts in gaining a general understanding of the capabilities, structures, and distinct characteristics of each major model. It's important to note that this comparison will not delve into the finer details or specific versions of each model, but rather highlight the overarching features and developments in the field.

Language Models

We explore each company's AI language models, outlining key specs like model size, architecture, training data, output capacity, context window, and type (open or closed source), along with release dates. This guide aims to clear up any confusion regarding model specifications, providing a straightforward comparison to navigate through the complexity of these models.

Data Integrity and Contributions

In the fast-paced domain of AI, information can quickly become outdated. To maintain the integrity of this repository, we rigorously source and verify any uncertain data before inclusion. Every piece of information is accompanied by a citation linking to its source, ensuring transparency and reliability.

Your insights and updates are crucial to the resource's vitality. We invite you to contribute by submitting pull requests with verifiable information or by flagging any discrepancies you may encounter. Open an issue to discuss potential changes or to seek clarification on existing data.


Table

Company Model Parameters Size Structure Training Data (Date) Output Token context window Type Release Date
OpenAI GPT-3.5 20B Transformer 45T tokens(~2021/09) 4,096 4k~16k close 2022/3/15
GPT-4 8*220B (Plus, Team, Enterprise) MoE (8 experts) ? 4,096 32k (Plus, Team) 128K(Enterprise) close 2023/4/4
GPT-4o over 175B close 2024/5/13
Google PaLM 2 340B (Unicorn, Bison, Otter, Gecko) Transformer 3.6T tokens(~2023/8) 1024~8192 8K~32K close 2023/5/10
Gemma 2B, 7B Transformer 2T, 6T tokens ? 8K open source 2024/2/21
Gemma 2 9B, 27B ? 8T, 13T tokens ? 8,192 open source 2024/06/27
Gemini 1.0 Nano1(1.8B), Nano2(3.3B), Pro, Ultra Transformer ? 2,048 8K close 2024/2/8
Gemini 1.5 Pro MoE ? 8,192 128k~1m close 2024/2/15
Anthropic Claude 2.1 200B Evolved Transformer ? (~2023/Early) 4,096 200K close 2023/11/21
Claude 3 20B, 70B, 2T (Haiku, Sonnet, Opus ) Sparse Transformer 40T tokens(~2023/8) 4,096 200K close 2024/3/4
Meta Llama2 7B, 13B, 70B Transformer 2T tokens 4,096 4,096 open source 2023/7/18
Llama3 8B, 70B, 400B Transformer 15T tokens ~8B: 2023/3 ~70B: 2023/12 8,192 8,192 open source 2024/4/18
Mistral AI Mistral (Tiny) 7B Transformer ? ? 32k open source 2023/9/27
Mixtral 8x7B (Small) 45B(use 12B) SMoE (8 experts) ? ? 32k open source 2023/12/11
Mistral Medium, Large ? ? ? 32k close 2024/2/26
Mistral 8x22B 141B(use 39B) SMoE (8 experts) ? ? 64k open source 2024/4/10
xAI Grok-1 314B MoE (8 experts) ? ? 8k open source 2024/3/17
Apple MM1 3B, 7B, 30B MoE (3B:64 experts, 7B-32 experts) ? (Have quantity) ? ? close 2024/3/18
OpenELM 270M, 450M, 1B, 3B Transformer ? ? ? open source 2024/4/25
Snowflake Arctic 480B(use 17B) Dense-MoE (128 experts) 3.5T ? 4k open source 2024/4/25
Cohere Command R+ 104B Dense ? ? 128K open source 2024/4/4
Microsoft Phi-3 Mini(3.8B), Small(7B), Medium(14B) Dense 3.3T ? 4K, 128K open source 2024/4/24
Phi-3-vision 4.2B 500B ? 128K open source 2024/5/21

Details and Rumor sources

OpenAI

Google

Anthropic

Meta

  • Llama 2
  • Llama 3
    • Parameters Size: 8B, 70B, 400B
    • Structure: Transformer
    • Training Data (Date): 15T tokens (8B: ~2023/3, 70B: ~2023/12)
    • Output Token: 8,192
    • Context Window: 8,192
    • Type: open source
    • Release Date: 2024/4/18

Mistral AI

  • Mistral (Tiny)

    • Parameters Size: 7B
    • Structure: Transformer
    • Training Data (Date): ?
    • Output Token: ?
    • Context Window: 32k
    • Type: open source
    • Release Date: 2023/9/27
  • Mixtral 8x7B (Small)

    • Parameters Size: 45B(use 12B)
    • Structure: SMoE (8 experts)
    • Training Data (Date): ?
    • Output Token: ?
    • Context Window: 32k
      • Mistral news
    • Type: open source
    • Release Date: 2023/12/11
  • Mistral

    • Parameters Size: Medium, Large
    • Structure: ?
    • Training Data (Date): ?
    • Output Token: ?
    • context window: 32k
    • Type: close
    • Release Date: 2024/2/26
  • Mixtral 8x22B

    • Parameters Size: 141B (use 39B)
    • Structure: SMoE (8 experts)
    • Training Data (Date): ?
    • Output Token: ?
    • context window: 64k
    • Type: open source
    • Release Date: 2024/4/10

xAI

  • Grok-1
    • Parameters Size: 314B(use 79B)
    • Structure: MoE (8 experts)
    • Training Data (Date): ?
    • Output Token: ?
    • Context Window: 8k
    • Type: open source
    • Release Date: 2024/3/17

Apple

  • MM1
    • Parameters Size: 3B, 7B, 30B
    • Structure: MoE (3B:64 experts, 7B-32 experts)
    • Training Data (Date): ?
      • Captioned Images: 2B image-text pairs
      • Captioned Images (Synthetic): 300M image-text pairs
      • Interleaved Image-Text: 600M documents
      • Text-only: 2T tokens
    • Output Token: ?
    • Context Window: ?
    • Type: close
    • Release Date: 2024/3/18

Snowflake

  • Arctic
    • Parameters Size: 480B (use 17B)
    • Structure: Dense-MoE (128 experts)
    • Training Data (Date): 3.5T
    • Output Token: ?
    • Context Window: 4K
    • Type: open source
    • Release Date: 2024/4/25

Cohere

  • Command R+
    • Parameters Size: 104B
    • Structure: Dense
    • Training Data (Date): ?
    • Output Token: ?
    • Context Window: 128K
    • Type: open source
    • Release Date: 2024/4/4

Other LLM Comparisons

Model Specifications and Performance Scores

For a detailed comparison of various Large Language Models (LLMs) in terms of their specifications and performance scores, see the following resource:

  • LLM Comparison/Test by Wolfram Ravenwolf: This comparison includes 17 new models, totaling 64 ranked models in the Gembo v0 series.
  • Local LLM Comparison Colab UI: Compare the performance of different LLM that can be deployed locally on consumer hardware.