microsoft/kernel-memory

[Question] does content.url in filename for websites make sense? (I want attribution per paragraph via separate prompt)

chaelli opened this issue · 4 comments

Context / Scenario

I changed the prompt to make sure the llm includes the source per paragraph of the answer. So I can more closly align the response with the facts for my users. When I do that, I can only tell it to reference the filename (as this is what the llm gets in the facts part of the prompt). For websites this is always "content.url" - because this is set so in

fileName: "content.url",

Question

I wonder if it would not make more sense to put the url there instead of a static string. Or at least include the url in the facts where it exists.

dluc commented

You should be able to swap content.url with the URL upon receiving the response, there is a property with the URL

This only works if there is just 1 relevant source - if there are multiple, I would not know which part of the answer is based on what page. If there are multiple sources, they are all called content.url and I cannot align separate sources to separate paragraphs.
fyi until I started using kernel memory, I just used a prompt like this:

Add a source reference to the end of each sentence. e.g. Apple is a fruit ([Reference page title](Reference page url)) (markdown link formatting). ...

@dluc Do you have any preference between the options:

  • replace "content.url" during indexing with the real url value?
  • additing the url as an additional value in the prompt?

Or none of them?

dluc commented

@dluc Do you have any preference between the options:

* replace "content.url" during indexing with the real url value?

* additing the url as an additional value in the prompt?

Or none of them?

I would try the approach with the prompt, it should be easier. Changing the indexing pipeline might have unexpected impact