I use a knowledge base to parse documents into QA data. Why can't I use it
Opened this issue · 1 comments
Self Checks
- This is only for bug report, if you would like to ask a question, please head to Discussions.
- I have searched for existing issues search for existing issues, including closed ones.
- I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- Please do not modify this template :) and fill in all the required fields.
Dify version
0.13.2
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
I use qwen-plus, the Embedding model is text-embedding v3, and the Rerank model is gte-rerank
Failed to address: /console/API/datasets/indexing - estimate
Parameter: {
"info_list": {
"data_source_type": "upload_file",
"file_info_list": {
"file_ids": [
"b5447ef1-f812-44e4-a7e0-eb4ce421c965"
]
}
},
"indexing_technique": "high_quality",
"process_rule": {
"rules": {},
"mode": "automatic"
},
"doc_form": "qa_model",
"doc_language": "Chinese",
"dataset_id": "fb498a02-a7d1-446e-bd5d-c4b041ff5599"
}
Result: {
"code": "indexing_estimate_error",
"message": "[tongyi] Error: 3 validation errors for AssistantPromptMessage\ncontent.str\n Input should be a valid string [type=string_type, input_value=[{'text': 'Q1: 贵公司...2020年06月08日。'}], input_type=list]\n For further information visit https://errors.pydantic.dev/2.9/v/string_type\ncontent.json-or-python[json=list[PromptMessageContent],python=chain[is-instance[Sequence],function-wrap[sequence_validator()]]].0.type\n Field required [type=missing, input_value={'text': 'Q1: 贵公司...2020年06月08日。'}, input_type=dict]\n For further information visit https://errors.pydantic.dev/2.9/v/missing\ncontent.json-or-python[json=list[PromptMessageContent],python=chain[is-instance[Sequence],function-wrap[sequence_validator()]]].0.data\n Field required [type=missing, input_value={'text': 'Q1: 贵公司...2020年06月08日。'}, input_type=dict]\n For further information visit https://errors.pydantic.dev/2.9/v/missing",
"status": 500
}
✔️ Expected Behavior
I can't use QA parsing, I want to use it to parse my document
❌ Actual Behavior
No response
I found a similar unsolved discussion that might be relevant to your issue: Error: Response output is missing or does not contain embeddings. [1]. This discussion might provide some insights or potential solutions related to validation errors and embedding issues.
Regarding your specific issue, the error message indicates that there are validation errors related to the AssistantPromptMessage
model. The content
field is expected to be a valid string or a list of PromptMessageContent
, but it seems to be receiving a list with a dictionary containing text, which is causing the validation to fail. Ensure that the content
field is correctly formatted as a string or a list of PromptMessageContent
objects, which can include TextPromptMessageContent
, ImagePromptMessageContent
, or AudioPromptMessageContent
[2].
Additionally, check that all required fields are provided and correctly formatted in your input data to avoid missing field errors.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other