I use a knowledge base to parse documents into QA data. Why can't I use it

Question

I use a knowledge base to parse documents into QA data. Why can't I use it

Opened this issue 3 days ago · 1 comments

ouguofeng commented 3 days ago

Self Checks

This is only for bug report, if you would like to ask a question, please head to Discussions.
I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
Please do not modify this template :) and fill in all the required fields.

Dify version

0.13.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

I use qwen-plus, the Embedding model is text-embedding v3, and the Rerank model is gte-rerank

Failed to address: /console/API/datasets/indexing - estimate

Parameter: {
"info_list": {
"data_source_type": "upload_file",
"file_info_list": {
"file_ids": [
"b5447ef1-f812-44e4-a7e0-eb4ce421c965"
]
}
},
"indexing_technique": "high_quality",
"process_rule": {
"rules": {},
"mode": "automatic"
},
"doc_form": "qa_model",
"doc_language": "Chinese",
"dataset_id": "fb498a02-a7d1-446e-bd5d-c4b041ff5599"
}

Result: {
    "code": "indexing_estimate_error",
    "message": "[tongyi] Error: 3 validation errors for AssistantPromptMessage\ncontent.str\n  Input should be a valid string [type=string_type, input_value=[{'text': 'Q1: 贵公司...2020年06月08日。'}], input_type=list]\n    For further information visit https://errors.pydantic.dev/2.9/v/string_type\ncontent.json-or-python[json=list[PromptMessageContent],python=chain[is-instance[Sequence],function-wrap[sequence_validator()]]].0.type\n  Field required [type=missing, input_value={'text': 'Q1: 贵公司...2020年06月08日。'}, input_type=dict]\n    For further information visit https://errors.pydantic.dev/2.9/v/missing\ncontent.json-or-python[json=list[PromptMessageContent],python=chain[is-instance[Sequence],function-wrap[sequence_validator()]]].0.data\n  Field required [type=missing, input_value={'text': 'Q1: 贵公司...2020年06月08日。'}, input_type=dict]\n    For further information visit https://errors.pydantic.dev/2.9/v/missing",
    "status": 500
}

✔️ Expected Behavior

I can't use QA parsing, I want to use it to parse my document

❌ Actual Behavior

No response

Answer 1 · 2024-12-18T10:18:06.000Z

I found a similar unsolved discussion that might be relevant to your issue: Error: Response output is missing or does not contain embeddings. ^[1]. This discussion might provide some insights or potential solutions related to validation errors and embedding issues.

Regarding your specific issue, the error message indicates that there are validation errors related to the AssistantPromptMessage model. The content field is expected to be a valid string or a list of PromptMessageContent, but it seems to be receiving a list with a dictionary containing text, which is causing the validation to fail. Ensure that the content field is correctly formatted as a string or a list of PromptMessageContent objects, which can include TextPromptMessageContent, ImagePromptMessageContent, or AudioPromptMessageContent ^[2].

Additionally, check that all required fields are provided and correctly formatted in your input data to avoid missing field errors.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}