Docker LLaMA2 Chat / 羊驼二代

中文文档 | ENGLISH

三步上手 LLaMA2，一起玩！相关博客教程已更新，同样欢迎“一键三连” 🌟🌟🌟。

使用 Docker 快速上手，本地部署 7B 或 13B 官方模型，或者 7B 中文模型。

Meta Llama2 模型，使用 4090 验证，需要 8~14GB 显存
中文 Llama2 模型，使用 4090 验证，需要 8~14GB 显存
量化后的中文 Llama2 模型，使用 4090 验证，需要 5GB 显存
使用 GGML (llama.cpp) 模型，只需要 CPU 就能够运行模型

你可以参考项目代码，举一反三，把模型跑起来，接入到你想玩的地方，包括并不局限于支持 LLaMA 1代的各种开源软件中。

预览图

博客教程

使用方法

一条命令，从项目中构建官方版（7B或13B）模型镜像，或中文版镜像（7B或INT4量化版）：

# 7B
bash scripts/make-7b.sh

# 或 13B
bash scripts/make-13b.sh

# 或 7B Chinese
bash scripts/make-7b-cn.sh

# 或 7B Chinese 4bit
bash scripts/make-7b-cn-4bit.sh

选择适合你的命令，从 HuggingFace 下载 LLaMA2 或中文模型：

# MetaAI LLaMA2 Models (10~14GB vRAM)
git clone https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
git clone https://huggingface.co/meta-llama/Llama-2-13b-chat-hf

mkdir meta-llama
mv Llama-2-7b-chat-hf meta-llama/
mv Llama-2-13b-chat-hf meta-llama/

# 或 Chinese LLaMA2 (10~14GB vRAM)
git clone https://huggingface.co/LinkSoul/Chinese-Llama-2-7b

mkdir LinkSoul
mv Chinese-Llama-2-7b LinkSoul/

# 或 Chinese LLaMA2 4BIT (5GB vRAM)
git clone https://huggingface.co/soulteary/Chinese-Llama-2-7b-4bit

mkdir soulteary
mv Chinese-Llama-2-7b-4bit soulteary/

将下载好的模型，保持在一个正确的目录结构中。

tree -L 2 meta-llama
soulteary
└── ...
LinkSoul
└── ...
meta-llama
├── Llama-2-13b-chat-hf
│   ├── added_tokens.json
│   ├── config.json
│   ├── generation_config.json
│   ├── LICENSE.txt
│   ├── model-00001-of-00003.safetensors
│   ├── model-00002-of-00003.safetensors
│   ├── model-00003-of-00003.safetensors
│   ├── model.safetensors.index.json
│   ├── pytorch_model-00001-of-00003.bin
│   ├── pytorch_model-00002-of-00003.bin
│   ├── pytorch_model-00003-of-00003.bin
│   ├── pytorch_model.bin.index.json
│   ├── README.md
│   ├── Responsible-Use-Guide.pdf
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   ├── tokenizer.model
│   └── USE_POLICY.md
└── Llama-2-7b-chat-hf
    ├── added_tokens.json
    ├── config.json
    ├── generation_config.json
    ├── LICENSE.txt
    ├── model-00001-of-00002.safetensors
    ├── model-00002-of-00002.safetensors
    ├── model.safetensors.index.json
    ├── models--meta-llama--Llama-2-7b-chat-hf
    ├── pytorch_model-00001-of-00003.bin
    ├── pytorch_model-00002-of-00003.bin
    ├── pytorch_model-00003-of-00003.bin
    ├── pytorch_model.bin.index.json
    ├── README.md
    ├── special_tokens_map.json
    ├── tokenizer_config.json
    ├── tokenizer.json
    ├── tokenizer.model
    └── USE_POLICY.md

选择使用下面的适合你的命令，一键运行 LLaMA2 模型应用：

# 7B
bash scripts/run-7b.sh
# 或 13B
bash scripts/run-13b.sh
# 或 Chinese 7B
bash scripts/run-7b-cn.sh
# 或 Chinese 7B 4BIT
bash scripts/run-7b-cn-4bit.sh

模型运行之后，在浏览器中访问 http://localhost7860 或者 http://你的IP地址:7860 就可以开始玩了。

Blue0rigin/docker-llama2-chat

Docker LLaMA2 Chat / 羊驼二代

预览图

博客教程

使用方法

相关项目