squidfunk/mkdocs-material

Cannot search headers containing colon (`:`)

Closed this issue ยท 6 comments

Context

No response

Bug description

When a symbol is defined as a separator, we expect that symbol to searchable, that is the search string being tokenized exactly the same as the docs contents. Based on observations it looks like this is not the case with the built-in plugin.

Related links

Reproduction

Created by following

  1. https://squidfunk.github.io/mkdocs-material/bug-report/reproduction/
  2. https://squidfunk.github.io/mkdocs-material/setup/setting-up-site-search/#built-in-search-plugin
  3. Configure separator to include spaces, dash and colon

example.zip (browse same sources on GitHub)

Steps to reproduce

  1. remove - info from plugins:
  2. mkdocs serve
  3. Search for dash-separated and observe match:
    image
  4. Search for colon:separated and be baffled:
    image
  5. For control, search for separated and observe tokenization works:
    image
  6. Try "colon:separated" ๐Ÿคž it works because that's the usual for "exact match":
    image

Browser

No response

Before submitting

Thanks for reporting. This is a feature and not a bug ๐Ÿ˜‰ : is considered a control character that allows to specify which field to search on. For example title:foo will only search the title of documents for foo. What we might be able to do is to allow for : as long as the word before the colon is not the name of a field, but I'm not sure this is a good idea, because then behaviour for title, text and tags differs from the rest of cases.

Normally, this isn't a problem because people don't search for :. I'll check if we can somehow limit to active fields, but if that doesn't prove to be viable, you might still fork the theme and adjust the query lexing stage:

/* Split query terms with tokenizer */
return transform(query, part => {
const terms: string[] = []
/* Initialize lexer and analyze part */
const lexer = new lunr.QueryLexer(part)
lexer.run()
/* Extract and tokenize term from lexeme */
for (const { type, str: term, start, end } of lexer.lexemes)
if (type === "TERM")
split(term, lunr.tokenizer.separator, (...range) => {
terms.push([
part.slice(0, start),
term.slice(...range),
part.slice(end)
].join(""))
})
/* Return terms */
return terms
})

Fixed in a6436bd. When the query lexer returns an unknown field, we transform the query to replace the : with whitespace, so the search pipeline doesn't consider this as a field name. Note that the query lexing approach is very new and needs to be better incorporated into the search worker. I'll tackle this when I work on search the next time, which will be very soon given how many open change requests concerning search there currently are.

For the time being, this fix should make the search in your docs functional again ๐Ÿ˜Š

I closed this too early. We need to keep this open until the fix is released.

thanks for working on this. โค๏ธ

Released as part of 9.0.7

I can confirm that 9.0.7 fixed this problem for us in production as well.

Thank you very much for working on this, and getting it fixed so quickly! โค๏ธ