How to integrate the plugin?
Closed this issue · 4 comments
I'm trying to setup a new search server with the ability to index german documents.
Therefore I discovered your plugin bundle, which seems to cover my needs.
Unfortunately I'm not able to integrate the plugin properly.
An example how I have tried it:
PUT http://huclmaid01:9200/movies
{
"settings":{
"index":{
"analysis":{
"filter":{
"umlaut":{
"type":"german_normalize"
}
},
"tokenizer" : {
"umlaut" : {
"type":"standard",
"filter" : "umlaut"
}
}
}
}
}
}
The tokens still contains the umlauts:
POST http://huclmaid01:9200/movies/_analyze?tokenizer=umlaut
{
"text": "Die Jahresfeier der Rechtsanwaltskanzleien auf dem Rhein in der Nähe von Köln hat viel Ökosteuer gekostet"
}
What am I doing wrong?
Do you have documents in index movies
, and a mapping for index movies
, with a field configured with tokenizer umlaut
?
Have used the first example to see the generated tokens, but I have also tried to map it to a field, but not with the expected result.
This is my test mapping (GET http://localhost:9200/movies/movie/_mapping):
{
"movies": {
"mappings": {
"movie": {
"properties": {
"message": {
"type": "string",
"analyzer": "deutsch"
}
}
}
}
}
}
And these are my settings (GET http://localhost:9200/movies/_settings):
{
"movies": {
"settings": {
"index": {
"creation_date": "1433778178966",
"uuid": "RHlpyXunSOucBIg2vuLaJg",
"analysis": {
"analyzer": {
"deutsch": {
"tokenizer": "umlaut"
}
},
"filter": {
"umlaut": {
"type": "german_normalize"
}
},
"tokenizer": {
"umlaut": {
"type": "standard",
"filter": "umlaut"
}
}
},
"number_of_replicas": "1",
"number_of_shards": "5",
"version": {
"created": "1050299"
}
}
}
}
}
The plugin seems to be installed properly (from GET http://localhost:9200/_nodes)
"plugins": [
{
"name": "plugin-bundle-1.5.2.0-e6ec36a",
"version": "1.5.2.0",
"description": "A collection of useful plugins",
"jvm": true,
"site": false
}
]
And my only document (GET http://localhost:9200/movies/movie/1):
{
"_index": "movies",
"_type": "movie",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"message": "Ein schöner Tag in Köln im Café an der Straßenecke"
}
}
I would expect that it would be found if I query for "koln", but it is only found by the search term "köln".
Maybe I have missed something?
Try this
DELETE /movies
PUT /movies
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"deutsch": {
"type" : "custom",
"tokenizer" : "standard",
"filter": [
"lowercase",
"german_normalize"
]
}
}
},
"number_of_replicas": "0",
"number_of_shards": "1"
}
}
}
GET /movies/_settings
POST /movies/movies/_mapping
{
"properties": {
"message": {
"type": "string",
"analyzer": "deutsch"
}
}
}
GET /movies/_mapping
PUT /movies/movies/1
{
"message" : "Ein schöner Tag in Köln im Café an der Straßenecke"
}
POST /movies/movies/_search
{
"query": {
"match": {
"message": "koln"
}
}
}
POST /movies/_analyze?analyzer=deutsch
{
"text" : "Die Jahresfeier der Rechtsanwaltskanzleien auf dem Rhein in der Nähe von Köln hat viel Ökosteuer gekostet"
}
As soon as you stop screwing it up, it actually works ;-)
I somehow misinterpreted the examples.
Thank you very much