[Feature] In the knowledge base, you can refer to the existing [Generate Problem] and design [Analysis Picture] to realize the text vectorization after parsing and parsing content in the picture.

Question

[Feature] In the knowledge base, you can refer to the existing [Generate Problem] and design [Analysis Picture] to realize the text vectorization after parsing and parsing content in the picture.

Closed this issue 2 months ago · 3 comments

MaxKB Version

V2.2

Please describe your needs or suggestions for improvements

在知识库中，可以参照现有的【生成问题】，设计【解析图片】，以实现对图片中内容解析和解析后的文字向量化。

最近有多个客户POC，遇到类似需求。
目前的做法是编排两个应用，其中一个应用专门做图片解析，另外一个主应用在接受用户问题进行检索后，把检索到的分段中的oss图片链接提取出来，然后调用另外一个应用解析，再回答主应用进行回答。
这样下来体验慢，用户感受非常不好，浪费Token资源。

细想，产品可以参照现有的【生成问题】，设计【解析图片】，以实现对图片中内容解析和解析后的文字向量化：

现在每个分段中的图片对应的OSS链接都有，比如./oss/file/019946b6-f72d-7a73-ab91-e442cb0b06c8
参照【生成问题】，设计【解析图片】，实现调用视觉模型，把每张图片对应的内容解析出来，并存储到数据库，同时进行向量化，这样就能实现对图片内容检索。
至于解析后的内容怎么在前端显示可以参照 “【问题】”的设计思路，加一个【图片解析】。
用户在分段里面，可以点击图片查看解析后的内容。
用户在【图片解析】里面，可以查看和编辑每张图片解析后的内容。
【图片解析】里面的文字和【问题】一样，也参与检索召回。
这样的设计能让存量用户可以顺利升级解析他们的存量有图片的文档。

Please describe the solution you suggest

No response

Additional Information

No response

Answer 1 · 2025-09-14T11:06:27.000Z

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

MaxKB Version

V2.2

Please describe your needs or suggestions for improvements

In the knowledge base, you can refer to the existing [Generate Problem] and design [Analysis Picture] to realize the text vectorization after parsing and parsing content in the picture.

Recently, there are multiple customers who have encountered similar needs.
The current approach is to orchestrate two applications, one of which specializes in image analysis, and the other main application After accepting user questions for searching, the Oss image link in the retrieved segment is extracted, and then the other application is called to analyze, and then the main application answers.
This way, the experience is slow, the user experiences are very bad, and it wastes Token resources.

If you think about it carefully, the product can refer to the existing [Generate Problem] and design [Analysis Picture] to realize the text vectorization after parsing and analyzing the content in the picture:

Now there is an OSS link corresponding to the picture in each segment, such as
Refer to [Generate Problems], design [Analysis Pictures], and call the visual model, parse the corresponding content of each picture, store it in the database, and vectorize it at the same time, so that the image content can be retrieved.
As for how to display the parsed content on the front end, you can refer to the design idea of "[Problem]" and add a [Picture Analysis].
In the segment, the user can click on the image to view the parsed content.
In [Image Analysis], users can view and edit the content after each image is parsed.
The text in [Picture Analysis] is the same as [Problem], and it also participates in the search and recall.
This design allows existing users to successfully upgrade and parse their existing documents with pictures.

Please describe the solution you suggest

No response

Additional Information

No response

Answer 2 · 2025-09-15T01:39:53.000Z

内部需求，请提交给各区负责人

Answer 3 · 2025-09-15T01:40:02.000Z

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Please submit internal requirements to the person in charge of each district