Issue with POSTMAN API Calling

Question

Issue with POSTMAN API Calling

shreyan1999 opened this issue 4 months ago · 6 comments

shreyan1999 commented 4 months ago

Hello,

I have tried to make a simple POST call using the URL "http://127.0.0.1:5272/v1/chat/completions" , and the body as follows,

{
"model": "Phi-3-mini-128k-cuda-int4-onnx",
"messages": [
{
"role": "user",
"content": "Hi"
}
],
"temperature": 0.7,
"top_p": 1,
"top_k": 10,
"max_tokens": 100,
"stream": true
}

When I make the API calling, the response body is being shown as follows,

data: {
"id": "chat.id.16",
"created": 1718722897,
"choices": [
{
"message": {
"role": "assistant",
"content": " Hello",
"tool_calls": []
},
"logprobs": null,
"index": 0,
"finish_reason": null,
"delta": {
"role": "assistant",
"content": " Hello",
"tool_calls": []
}
}
],
"prompt_filter_results": [],
"usage": null
}

data: {
"id": "chat.id.17",
"created": 1718722897,
"choices": [
{
"message": {
"role": "assistant",
"content": "!",
"tool_calls": []
},
"logprobs": null,
"index": 1,
"finish_reason": null,
"delta": {
"role": "assistant",
"content": "!",
"tool_calls": []
}
}
],
"prompt_filter_results": [],
"usage": null
}

data: {
"id": "chat.id.18",
"created": 1718722897,
"choices": [
{
"message": {
"role": "assistant",
"content": " How",
"tool_calls": []
},
"logprobs": null,
"index": 2,
"finish_reason": null,
"delta": {
"role": "assistant",
"content": " How",
"tool_calls": []
}
}
],
"prompt_filter_results": [],
"usage": null
}

data: {
"id": "chat.id.19",
"created": 1718722897,
"choices": [
{
"message": {
"role": "assistant",
"content": " can",
"tool_calls": []
},
"logprobs": null,
"index": 3,
"finish_reason": null,
"delta": {
"role": "assistant",
"content": " can",
"tool_calls": []
}
}
],
"prompt_filter_results": [],
"usage": null
}

data: {
"id": "chat.id.20",
"created": 1718722897,
"choices": [
{
"message": {
"role": "assistant",
"content": " I",
"tool_calls": []
},
"logprobs": null,
"index": 4,
"finish_reason": null,
"delta": {
"role": "assistant",
"content": " I",
"tool_calls": []
}
}
],
"prompt_filter_results": [],
"usage": null
}

data: {
"id": "chat.id.21",
"created": 1718722897,
"choices": [
{
"message": {
"role": "assistant",
"content": " assist",
"tool_calls": []
},
"logprobs": null,
"index": 5,
"finish_reason": null,
"delta": {
"role": "assistant",
"content": " assist",
"tool_calls": []
}
}
],
"prompt_filter_results": [],
"usage": null
}

data: {
"id": "chat.id.22",
"created": 1718722897,
"choices": [
{
"message": {
"role": "assistant",
"content": " you",
"tool_calls": []
},
"logprobs": null,
"index": 6,
"finish_reason": null,
"delta": {
"role": "assistant",
"content": " you",
"tool_calls": []
}
}
],
"prompt_filter_results": [],
"usage": null
}

data: {
"id": "chat.id.23",
"created": 1718722897,
"choices": [
{
"message": {
"role": "assistant",
"content": " today",
"tool_calls": []
},
"logprobs": null,
"index": 7,
"finish_reason": null,
"delta": {
"role": "assistant",
"content": " today",
"tool_calls": []
}
}
],
"prompt_filter_results": [],
"usage": null
}

data: {
"id": "chat.id.24",
"created": 1718722897,
"choices": [
{
"message": {
"role": "assistant",
"content": "?",
"tool_calls": []
},
"logprobs": null,
"index": 8,
"finish_reason": null,
"delta": {
"role": "assistant",
"content": "?",
"tool_calls": []
}
}
],
"prompt_filter_results": [],
"usage": null
}

data: {
"id": "chat.id.25",
"created": 1718722897,
"choices": [
{
"message": {
"role": "assistant",
"content": "",
"tool_calls": []
},
"logprobs": null,
"index": 9,
"finish_reason": "stop",
"delta": {
"role": "assistant",
"content": "",
"tool_calls": []
}
}
],
"prompt_filter_results": [],
"usage": null
}

data: [DONE
]

The response is split into multiple subclasses for every word. However when I remove the following section from the body " "temperature": 0.7,
"top_p": 1,
"top_k": 10,
"max_tokens": 100,
"stream": true"

The response is coming twice without the words being divided into subclasses.

Tagging @leestott

Answer 1 · 2024-06-19T10:17:16.000Z

Hi replicated this issue

Postman post method http://127.0.0.1:5272/v1/chat/completions

JSON
{
"model": "Phi-3-mini-4k-directml-int4-awq-block-128-onnx",
"messages": [
{
"role": "user",
"content": "Hi"
}
],
"temperature": 0.7,
"top_p": 1,
"top_k": 10,
"max_tokens": 100,
"stream": true
}

The issue is the response is being brought back in sub classes not a single line

Answer 2 · 2024-06-19T10:21:59.000Z

Additional update

If you remove the temp-stream you get this output which is duplicated.

{
"id": "chat.id.14",
"created": 1718722757,
"choices": [
{
"message": {
"role": "assistant",
"content": " Hello! How can I assist you today?",
"tool_calls": []
},
"logprobs": null,
"index": 0,
"finish_reason": "stop",
"delta": {
"role": "assistant",
"content": " Hello! How can I assist you today?",
"tool_calls": []
}
}
],
"prompt_filter_results": [],
"usage": null
}

Answer 3 · 2024-06-26T07:55:59.000Z

@shreyan1999 @leestott , the local API is uses the OpenAI Chat Completion contract, which supports streaming.

So if using "stream": true, the response is split into chunks and please use the delta field.
If using "stream": false or removing "stream", the response is the whole message and please use the message field.

Answer 4 · 2024-06-29T18:10:43.000Z

{
"id": "chat.id.1",
"created": 1719684353,
"choices": [
{
"message": {
"role": "assistant",
"content": " Hello! How can I assist you today?",
"tool_calls": []
},
"logprobs": null,
"index": 0,
"finish_reason": "stop",
"delta": {
"role": "assistant",
"content": " Hello! How can I assist you today?",
"tool_calls": []
}
}
],
"prompt_filter_results": [],
"usage": null
}

Thank you for the resolution @swatDong .Yes the issue was resolve upon setting the parameter to False. The response is still coming out twice, once under the choices->message-> content section and second time in the delta->content section. As we can have the first occurrence for the content message extraction. What is the second occurrence exactly notify?

Answer 5 · 2024-07-01T02:09:28.000Z

@shreyan1999 - when using stream: false, please just ignore the delta section. The both occurrences are exactly the same.

I'll check internally for the duplicated content issue.

Answer 6 · 2024-07-15T02:18:56.000Z

No activity for 2 weeks, closing this issue.
Feel free to comment or reopen, and we will re-investigate.