dcoker部署embedding接口报错:"POST /v1/embeddings HTTP/1.1" 404 Not Found
syusama opened this issue · 2 comments
syusama commented
提交前必须检查以下项目 | The following items must be checked before submission
- 请确保使用的是仓库最新代码(git pull),一些问题已被解决和修复。 | Make sure you are using the latest code from the repository (git pull), some issues have already been addressed and fixed.
- 我已阅读项目文档和FAQ章节并且已在Issue中对问题进行了搜索,没有找到相似问题和解决方案 | I have searched the existing issues / discussions
问题类型 | Type of problem
模型推理和部署 | Model inference and deployment
操作系统 | Operating system
Linux
详细描述问题 | Detailed description of the problem
通过docker命令启动,
docker compose -f .\docker-compose.vllm.qwen2.yml up -d
服务成功启动,对话也没有问题,但是调用embedding接口会报错
(docker compose 和.env均已配置好本地embedding模型,之前可以成功运行)
"POST /v1/embeddings HTTP/1.1" 404 Not Found
dockers-compose文件
version: '3.10'
services:
vllmapiserver:
image: llm-api:vllm
command: python api/server.py
ulimits:
stack: 67108864
memlock: -1
environment:
- PORT=8000
- MODEL_NAME=qwen2
- MODEL_PATH=checkpoints/Qwen2-7B-Instruct
- EMBEDDING_NAME=checkpoints/bce-embedding-base_v1
- TENSOR_PARALLEL_SIZE=2
- TRUST_REMOTE_CODE=true
- PROMPT_NAME=qwen2
volumes:
- D:\projects\api-for-open-llm\api-for-open-llm:/workspace
# model path need to be specified if not in pwd
- D:\projects\Qwen\models:/workspace/checkpoints
env_file:
- .env.qwen2.vllm
ports:
- "8053:8000"
restart: always
networks:
- vllmapinet
shm_size: 200g
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0','1'] # 指定gpu
capabilities: [gpu]
networks:
vllmapinet:
driver: bridge
name: vllmapinet
.env文件
PORT=8053
# model related
MODEL_NAME=qwen2
MODEL_PATH=D:\projects\Qwen\models\Qwen2-7B-Instruct
PROMPT_NAME=qwen2
EMBEDDING_NAME=D:\projects\Qwen\models\bce-embedding-base_v1
CONTEXT_LEN=2400
DEVICE_MAP=auto
is_qwen_derived_model=false
# api related
API_PREFIX=/v1
# vllm related
ENGINE=vllm
TRUST_REMOTE_CODE=true
TOKENIZE_MODE=slow
TENSOR_PARALLEL_SIZE=2
DTYPE=half
Dependencies
No response
运行日志或截图 | Runtime logs or screenshots
2024-06-08 00:16:48 api-for-open-llm-vllmapiserver-1 | INFO: Started server process [1]
2024-06-08 00:16:48 api-for-open-llm-vllmapiserver-1 | INFO: Waiting for application startup.
2024-06-08 00:16:48 api-for-open-llm-vllmapiserver-1 | INFO: Application startup complete.
2024-06-08 00:16:48 api-for-open-llm-vllmapiserver-1 | INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
2024-06-08 00:17:40 api-for-open-llm-vllmapiserver-1 | 2024-06-07 16:17:40.621 | DEBUG | api.vllm_routes.chat:create_chat_completion:65 - ==== request ====
2024-06-08 00:17:40 api-for-open-llm-vllmapiserver-1 | {'model': 'Qwen1.5-14B-Chat', 'frequency_penalty': 0.0, 'function_call': None, 'functions': None, 'logit_bias': None, 'logprobs': False, 'max_tokens': 8000, 'n': 1, 'presence_penalty': 0.0, 'response_format': None, 'seed': None, 'stop': ['<|endoftext|>', '<|im_end|>'], 'temperature': 0.01, 'tool_choice': None, 'tools': None, 'top_logprobs': None, 'top_p': 1.0, 'user': None, 'stream': True, 'repetition_penalty': 1.03, 'typical_p': None, 'watermark': False, 'best_of': 1, 'ignore_eos': False, 'use_beam_search': False, 'stop_token_ids': [], 'skip_special_tokens': True, 'spaces_between_special_tokens': True, 'min_p': 0.0, 'include_stop_str_in_output': False, 'length_penalty': 1.0, 'guided_json': None, 'guided_regex': None, 'guided_choice': None, 'guided_grammar': None, 'guided_decoding_backend': None, 'prompt_or_messages': [{'content': '你好', 'role': 'user'}], 'echo': False}
2024-06-08 00:17:41 api-for-open-llm-vllmapiserver-1 | INFO: 172.18.0.1:32992 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-06-08 00:29:45 api-for-open-llm-vllmapiserver-1 | INFO: 172.18.0.1:51488 - "POST /v1/embeddings HTTP/1.1" 404 Not Found
xusenlinzy commented
要使用embedding的话,环境变量需要修改TASKS=llm,rag
,其中llm是指启动大模型,rag是指启动embedding、rerank等rag相关的模型
syusama commented
要使用embedding的话,环境变量需要修改
TASKS=llm,rag
,其中llm是指启动大模型,rag是指启动embedding、rerank等rag相关的模型
原来是这样,在环境变量中加入TASKS=llm,rag后顺利启动embedding模型,接口也不报错了,感谢大大!