lnx-search/lnx

Why is every doc field wrapped in an array in search results?

Closed this issue · 3 comments

For example:

{
  "status": 200,
  "data": {
    "hits": [
      {
        "doc": {
          "author": [
            "248b2e6a-7c36-4da3-bcc4-55a979eb57dc"
          ],
          "id": [
            18
          ],
          "title": [
            "title 01"
          ],
          "uuid": [
            "06dbf5c7-d313-413d-8f65-49aed93e4031"
          ]
        },
        "document_id": "1628525110829290421",
        "score": 1.542423
      },
      {
        "doc": {
          "author": [
            "248b2e6a-7c36-4da3-bcc4-55a979eb57dc"
          ],
          "id": [
            19
          ],
          "title": [
            "title 02"
          ],
          "uuid": [
            "8da05387-8727-4a27-baa7-265af7558c0c"
          ]
        },
        "document_id": "1493516234521670736",
        "score": 1.542423
      },
      {
        "doc": {
          "author": [
            "248b2e6a-7c36-4da3-bcc4-55a979eb57dc"
          ],
          "id": [
            20
          ],
          "title": [
            "title 03"
          ],
          "uuid": [
            "3bf64ee1-f2ac-46ce-8e45-0d25956b195c"
          ]
        },
        "document_id": "9603160257558085701",
        "score": 1.542423
      }
    ],
    "count": 3,
    "time_taken": 0.000578893
  }
}

I think it would make much more sense to show the doc as it has been posted.

from an implementation side, everything in tantivy is multi-value. Overall we do very little post-processing of the returned documents so we just end up returning the fields embedded as an array.

We in theory could make a field be a single value if tantivy doesn't return us any value but this could lead to unexpected behaviour should some documents have single values and others have multiple values.

I agree that this probably should be changed so that we return the values as matching the schema, i.e not wrapped in an array. While we're in the process of that we could also explicitly add array types to the schema.

This a relatively easy thing for me to implement which lines up nicely with some other schema changes so should be able to do this in the next day or so to be merged in master.

This is now available as a stabilised version 0.9.0-beta