jprante/elasticsearch-plugin-bundle

How to integrate the plugin?

Closed this issue · 4 comments

I'm trying to setup a new search server with the ability to index german documents.
Therefore I discovered your plugin bundle, which seems to cover my needs.
Unfortunately I'm not able to integrate the plugin properly.
An example how I have tried it:
PUT http://huclmaid01:9200/movies
{
"settings":{
"index":{
"analysis":{
"filter":{
"umlaut":{
"type":"german_normalize"
}
},
"tokenizer" : {
"umlaut" : {
"type":"standard",
"filter" : "umlaut"
}
}
}
}
}
}

The tokens still contains the umlauts:
POST http://huclmaid01:9200/movies/_analyze?tokenizer=umlaut
{
"text": "Die Jahresfeier der Rechtsanwaltskanzleien auf dem Rhein in der Nähe von Köln hat viel Ökosteuer gekostet"
}

What am I doing wrong?

Do you have documents in index movies, and a mapping for index movies, with a field configured with tokenizer umlaut?

Have used the first example to see the generated tokens, but I have also tried to map it to a field, but not with the expected result.

This is my test mapping (GET http://localhost:9200/movies/movie/_mapping):
{
"movies": {
"mappings": {
"movie": {
"properties": {
"message": {
"type": "string",
"analyzer": "deutsch"
}
}
}
}
}
}

And these are my settings (GET http://localhost:9200/movies/_settings):
{
"movies": {
"settings": {
"index": {
"creation_date": "1433778178966",
"uuid": "RHlpyXunSOucBIg2vuLaJg",
"analysis": {
"analyzer": {
"deutsch": {
"tokenizer": "umlaut"
}
},
"filter": {
"umlaut": {
"type": "german_normalize"
}
},
"tokenizer": {
"umlaut": {
"type": "standard",
"filter": "umlaut"
}
}
},
"number_of_replicas": "1",
"number_of_shards": "5",
"version": {
"created": "1050299"
}
}
}
}
}

The plugin seems to be installed properly (from GET http://localhost:9200/_nodes)
"plugins": [
{
"name": "plugin-bundle-1.5.2.0-e6ec36a",
"version": "1.5.2.0",
"description": "A collection of useful plugins",
"jvm": true,
"site": false
}
]

And my only document (GET http://localhost:9200/movies/movie/1):
{
"_index": "movies",
"_type": "movie",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"message": "Ein schöner Tag in Köln im Café an der Straßenecke"
}
}

I would expect that it would be found if I query for "koln", but it is only found by the search term "köln".
Maybe I have missed something?

Try this

DELETE /movies

PUT /movies
{
      "settings": {
         "index": {
            "analysis": {
               "analyzer": {
                  "deutsch": {
                      "type" : "custom",
                      "tokenizer" : "standard",
                      "filter": [ 
                          "lowercase",
                          "german_normalize" 
                          ]
                  }
               }
            },
            "number_of_replicas": "0",
            "number_of_shards": "1"
         }
      }
}

GET /movies/_settings

POST /movies/movies/_mapping
{
            "properties": {
               "message": {
                  "type": "string",
                  "analyzer": "deutsch"
               }
    }
}

GET /movies/_mapping

PUT /movies/movies/1
{
    "message" : "Ein schöner Tag in Köln im Café an der Straßenecke"
}

POST /movies/movies/_search
{
    "query": {
        "match": {
           "message": "koln"
        }
    }
}


POST /movies/_analyze?analyzer=deutsch
{
    "text" : "Die Jahresfeier der Rechtsanwaltskanzleien auf dem Rhein in der Nähe von Köln hat viel Ökosteuer gekostet"
}

As soon as you stop screwing it up, it actually works ;-)
I somehow misinterpreted the examples.
Thank you very much