YannBrrd/elasticsearch-entity-resolution

Missing fields issue

Closed this issue · 5 comments

Hello, Yann. I have faced with some difficulties. Could you please help me.

Here (https://github.com/YannBrrd/elasticsearch-entity-resolution/wiki/configure) you wrote that a missing field is ignored (ie. it gets a 0.5 score, which does not affect final result). But I faced with different behavior.

My mapping:

PUT /users/user/_mapping
{
    "user": {
        "_timestamp": {
            "enabled": false
        },
        "properties": {
            "gender": {
                "type": "string",
                "index": "not_analyzed"
            },
            "name": {
                "type": "string",
                "index": "not_analyzed"
            }
        }
    }
}

Adding some test users:

POST /users/user/1
{
    "name": "dale",
    "gender": "m"
}

POST /users/user/2
{
    "name": "david"
}

GET /users/user/3
{
    "name": "dale"
}

My request:

GET /users/user/_search
{
    "size": 3,
    "query": {
        "function_score": {
            "query": {
                "match_all": {}
            },
            "boost_mode" : "replace",
            "max_boost": 1.0,
            "script_score": {
                "script": "entity-resolution",
                "lang": "native",
                "params": {
                    "entity": {
                        "fields": [
                            {
                                "field": "name",
                                "value": "dale",
                                "cleaners": [
                                    {
                                        "name": "no.priv.garshol.duke.cleaners.LowerCaseNormalizeCleaner"
                                    }
                                ],
                                "high": 0.9,
                                "low": 0.1,
                                "comparator": {
                                    "name": "no.priv.garshol.duke.comparators.ExactComparator"
                                }
                            },
                            {
                                "field": "gender",
                                "value": "m",
                                "cleaners": [
                                    {
                                        "name": "no.priv.garshol.duke.cleaners.TrimCleaner"
                                    },
                                    {
                                        "name": "no.priv.garshol.duke.cleaners.LowerCaseNormalizeCleaner"
                                    }
                                ],
                                "high": 0.9,
                                "low": 0.0,
                                "comparator": {
                                    "name": "no.priv.garshol.duke.comparators.ExactComparator"
                                }
                            }
                        ]
                    }
                }
            }
        }
    }
}

Response:

{
   "took": 4,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 0.9878049,
      "hits": [
         {
            "_index": "users",
            "_type": "user",
            "_id": "1",
            "_score": 0.9878049,
            "_source": {
               "name": "dale",
               "gender": "m"
            }
         },
         {
            "_index": "users",
            "_type": "user",
            "_id": "2",
            "_score": 0,
            "_source": {
               "name": "david"
            }
         },
         {
            "_index": "users",
            "_type": "user",
            "_id": "3",
            "_score": 0,
            "_source": {
               "name": "dale"
            }
         }
      ]
   }
}

As you can see the third user (dale) has score 0, but I expected that it has score 0.9. I need that such users will be above users with wrong name.

Cool, was wondering when I'd find time for that.

Can you make a PR ?

Cheers,
Yann

Le ven. 23 janv. 2015 12:52, Pavel Sviridov notifications@github.com a
écrit :

I found the problem. My fix is here. Gfif@1cc9bc1
svipy9@1cc9bc1


Reply to this email directly or view it on GitHub
#4 (comment)
.

I found the problem. My fix is here. svipy9@8be620a

Yes, sure.

Will add your test to build for regression tests...

Le ven. 23 janv. 2015 12:56, Pavel Sviridov notifications@github.com a
écrit :

Yes, sure.


Reply to this email directly or view it on GitHub
#4 (comment)
.

Fixed in last versions