toluaina/pgsync

Allow for top-level Elasticsearch `mapping` to be defined

jvanderen1 opened this issue · 5 comments

It would be nice to define an explicit Elasticsearch mapping (minus any _meta fields` at the top of the schema. Something akin to what is in the Elasticsearch documentation:

PUT /my-index-000001
{
  "mappings": {
    "properties": {
      "age":    { "type": "integer" },  
      "email":  { "type": "keyword"  }, 
      "name":   { "type": "text"  }     
    }
  }
}

I believe you can do this by setting any mapping you want inside transform.mapping: {} in the schema.json file. We use this to create abstract mappings that apply to some downstream transforms - working well so far.

@nowfred The problem is that we use custom plugins to shape the schema into a custom format before indexing. Our mappings don't necessarily align with PGSync's schema output.

We need a way to define a custom Elasticsearch mappings without referencing columns.

I have added support for this in the master branch. Please give it a try. Here is a schema example

@toluaina That's exciting! I will give it a shot 👍🏻

@toluaina Unfortunately, this does not seem to work for me. I added the following field:

[
  {
    ...
    "mappings": {
      "properties": {
        "name": {
          "type": "text",
          "fields": {
            "raw": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
]

I receive the following error:

elasticsearch.BadRequestError: BadRequestError(400, 'mapper_parsing_exception', 'Root mapping definition has unsupported parameters:  [index : {properties={collection={type=text, fields={raw={ignore_above=256, type=keyword}}}}}]')