This Repo is release for play with Zebra-7b-v2-lit
model.
This Repo is forked from Huggingface Transformers-Bloom-Inference.
We Implement the zebra model as a patch to original transformers and directly utilize the mentioned repo as an interface to our model.
- Paper: Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
- Model: Huggingface Model Repo
@article{song2023zebra,
title={Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention},
author={Kaiqiang Song and Xiaoyang Wang and Sangwoo Cho and Xiaoman Pan and Dong Yu},
year={2023}
}
Build a conda enviorment throught the below command line.
conda create -n zebra-inference python=3.9
conda activate zebra-inference
conda install -c anaconda cmake -y
pip install torch==1.12.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116 \
transformers==4.31.0 \
deepspeed==0.7.6 \
accelerate==0.20.3 \
gunicorn==20.1.0 \
flask==2.3.0 \
werzeug==2.3.0 \
flask_api \
fastapi==0.89.1 \
uvicorn==0.19.0 \
jinja2==3.1.2 \
pydantic==1.10.2 \
grpcio-tools==1.50.0 \
sentencepiece \
--no-cache-dir
Download the model with git.
git lfs install
git clone https://huggingface.co/kqsong/zebra-7b-lcat-v2-lit
bash launch_zebra_all.sh <path/to/model>
This will launch both frontend(port:5001) and backend(port:5000).
Please visit the localhost with port, or through your ip address and the port.
https://0.0.0.0:5051
https://<ip_address>:5051
- The code licensed under the Apache-2.0 License
- The model licensed under the Llama-2 License
This repo is only for research purpose. It is not an officially supported Tencent product.