removeLowRankingParses is eliminating the results I want
Closed this issue · 14 comments
Hello Two Fishes team,
I am currently trying to use Two Fishes primarily as a "splitting" geocoder. However, in some of my initial tests, I am noticing an issue that I am hoping to be able to resolve. There are certain queries which I would have expected to work correctly, which are not returning any interpretations.
As an example, I first tried searching for "astoria"
http://demo.twofishes.net/?query=astoria&debug=1
The result is what I expected, a result for Astoria, NY
Next, i tried searching for "mexican astoria"
http://demo.twofishes.net/?query=mexican%20astoria&debug=1
This returns zero results, and when I look at the debug output, I see that my result has been eliminated in the "removeLowRankingParses" stage. I also tried biasing the geocoder with a lat/lon, country, woeHints, just about every param there is, and nothing seemed to fix it.
Finally, I tried "mexican astoria ny"
http://demo.twofishes.net/?query=mexican%20astoria%20ny&debug=1
This returns what I expect as well.
However, I am concerned that the "mexican astoria" case will be a real world scenario for my users. I decided to take a look at the code around the "removeLowRankingParses" step, and it does not appear clear to me what exactly is happening in this stage. As a test, I removed this stage of the pruning and rebuilt my local server, and as expected, I got the result I was looking. I also tried several other examples that had failed me at one point or another, and they all seemed to be working as expected. However, I don't like the idea of eliminating code without truly understanding what it does. So please, if someone from the team could explain to me what exactly is happening in this step, why it is necessary, and why a query I would have expected to work is failing, I would greatly appreciate it. Thank you.
-Adam
[Cc Jon Parise from Pinterest for additional perspective on query splitting]
Hi Adam,
Our query splitting could use a lot of work. What-where parsing isn't
really what the geocoder has been optimised for, and neither has any of our
testing focused on evaluating how well it does this. You're right that
removeLowRankingParses is where the results are being filtered.
Specifically, this code here will throw away what-where parses unless the
'where' is a place with a population > 50000 (Astoria doesn't satisfy this
condition) or matches more than 1 token in the query (which "Astoria NY"
does but "Astoria" doesn't):
ResponseProcessor.scala:553
goodParses = goodParses.filter(p => {
val parseLength = p.tokenLength
parseLength == parseParams.tokens.size || parseLength != 1 ||
p.headOption.exists(m => {
m.fmatch.scoringFeatures.population > 50000 || p.length > 1
})
})
logger.ifDebug("have %d parses after removeLowRankingParses",
goodParses.size)
While I don't have the exact historic context around the picking of these
values (and David Blackman is currently on vacation), my first guess is
that this has to do with suppressing the canonical bad parse: "hair salon"
-> "hair" near "Salon, Finland." But I notice now we anyway do not succeed
in doing so :)
I plan to look into improving query splitting some time in the near future.
Until then, feel free to take a stab at any changes you think would be
meaningful and send us a pull request.
Jon: Do you have anything to add from your experience building one-box
search for Pinterest?
Rahul.
On Mon, Oct 13, 2014 at 9:29 AM, metsfan notifications@github.com wrote:
Hello Two Fishes team,
I am currently trying to use Two Fishes primarily as a "splitting"
geocoder. However, in some of my initial tests, I am noticing an issue that
I am hoping to be able to resolve. There are certain queries which I would
have expected to work correctly, which are not returning any
interpretations.As an example, I first tried searching for "astoria"
http://demo.twofishes.net/?query=astoria&debug=1The result is what I expected, a result for Astoria, NY
Next, i tried searching for "mexican astoria"
http://demo.twofishes.net/?query=mexican%20astoria&debug=1This returns zero results, and when I look at the debug output, I see that
my result has been eliminated in the "removeLowRankingParses" stage. I also
tried biasing the geocoder with a lat/lon, country, woeHints, just about
every param there is, and nothing seemed to fix it.Finally, I tried "mexican astoria ny"
http://demo.twofishes.net/?query=mexican%20astoria%20ny&debug=1This returns what I expect as well.
However, I am concerned that the "mexican astoria" case will be a real
world scenario for my users. I decided to take a look at the code around
the "removeLowRankingParses" step, and it does not appear clear to me what
exactly is happening in this stage. As a test, I removed this stage of the
pruning and rebuilt my local server, and as expected, I got the result I
was looking. I also tried several other examples that had failed me at one
point or another, and they all seemed to be working as expected. However, I
don't like the idea of eliminating code without truly understanding what it
does. So please, if someone from the team could explain to me what exactly
is happening in this step, why it is necessary, and why a query I would
have expected to work is failing, I would greatly appreciate it. Thank you.-Adam
—
Reply to this email directly or view it on GitHub
#49.
We're still using an older twofishes build in production that doesn't include this "low ranking" filtering pass, so resolving this issue is relevant to us, too.
One of the things that we do to improve the user experience is return a blended list of "global" and "local" (geocoded) results. It's still a fairly naive approach, but it does a good job of improving the chance that we're returning an intended result to the user. There are more details in the engineering blog post I wrote:
http://engineering.pinterest.com/post/88679054329/introducing-a-faster-place-search
@metsfan, I'm happy to work with you and @rahulpratapm to chart the best path forward.
Thanks for the answers, guys. I am going to mull it over and get back to you. I'm not overly concerned about the "hair salon" case, as I have a separate index of tags and categories which I am hitting prior to hitting two fishes. If I get a hit in the tag/category index, then I simply end my parsing there, so in the example you provided, "hair salon" would be a hit on the tag index, and we exit right there. If I decide to make any changes to the code, I will definitely let you know. Thanks to both of you.
Interesting. To this end, it seems the autocomplete bias code I wrote a few
months back to balance local, in-country and global results (I'm
oversimplifying) seems like it might be reusable. Let me think a bit more
about this. Thanks.
Rahul.
On Oct 13, 2014 2:22 PM, "Jon Parise" notifications@github.com wrote:
We're still using an older twofishes build in production that doesn't
include this "low ranking" filtering pass, so resolving this issue is
relevant to us, too.One of the things that we do to improve the user experience is return a
blended list of "global" and "local" (geocoded) results. It's still a
fairly naive approach, but it does a good job of improving the chance that
we're returning an intended result to the user. There are more details in
the engineering blog post I wrote:http://engineering.pinterest.com/post/88679054329/introducing-a-faster-place-search
@metsfan https://github.com/metsfan, I'm happy to work with you and
@rahulpratapm https://github.com/rahulpratapm to chart the best path
forward.—
Reply to this email directly or view it on GitHub
#49 (comment).
Hello Rahul,
I have a proposed solution, I would like to know what you think. As I see it, rather than removing this code entirely, it would be ideal if it could be controlled by parameters. I would suggest adding the following parameters:
remove_low_ranking_parses Boolean (default true) - If set to true, removes low ranking parses, otherwise by passes this step
min_population Integer (default 50000) - The minimum population to be considered a "low ranking parse". Defaults to 50000, which is currently hardcoded.
If you are agreeable to this, I will implement this change and create a pull request.
Looking forward to hearing your thoughts. Thank you.
Adam Eskreis
Director of Engineering, Citymaps
Sounds good. Thanks.
Rahul.
On Mon, Oct 20, 2014 at 3:47 PM, metsfan notifications@github.com wrote:
Hello Rahul,
I have a proposed solution, I would like to know what you think. As I see
it, rather than removing this code entirely, it would be ideal if it could
be controlled by parameters. I would suggest adding the following
parameters:remove_low_ranking_parses: Boolean (default true) - If set to true,
removes low ranking parses, otherwise by passes this step
min_population: Integer (default 50000) - The minimum population to be
considered a "low ranking parse". Defaults to 50000, which is currently
hardcoded.If you are agreeable to this, I will implement this change and create a
pull request.Looking forward to hearing your thoughts. Thank you.
Adam Eskreis
Director of Engineering, Citymaps—
Reply to this email directly or view it on GitHub
#49 (comment).
Hey Rahul,
Sorry to put this here but I don't know how else to contact you. If you want we can correspond further by email (you can reach me by email at adam@citymaps.com). I added new members to the GeocodeRequest and CommonGeocodeRequestParams structures in geocode.thrift file, but the default parameters are being ignored. I tested changing the default parameters of the members that were already there with the same result. The only type which seem to work for default values are strings, and I'd rather not make these values strings. I was forced to implement the default parameters in GeocodeServer.parseGeocodeRequest, but I would like to fix this before submitting a pull request. I can post more information if it would be helpful. Thanks.
P.S. I committed my code to my repository, you can view the code here before I make a pull request
https://github.com/metsfan/twofishes/compare/low_ranking_parses?expand=1
Hi Adam,
Which branch are you on? I could look at your local copy of the code change
if you'd like.
Rahul.
On Mon, Oct 20, 2014 at 6:56 PM, metsfan notifications@github.com wrote:
Hey Rahul,
Sorry to put this here but I don't know how else to contact you. If you
want we can correspond further by email (you can reach me by email at
adam@citymaps.com). I added new parameters to the GeocodeRequest and
CommonGeocodeRequestParams structures in geocode.thrift file, but the
default parameters are being ignored. I tested changing the default
parameters of the parameters that were already there with the same result.
The only type which seem to work for default values are strings, and I'd
rather not make these values strings. I was forced to implement the default
parameters in GeocodeServer.parseGeocodeRequest, but I would like to fix
this before submitting a pull request. I can post more information if it
would be helpful. Thanks.—
Reply to this email directly or view it on GitHub
#49 (comment).
In my previous post, I posted a link to my changes. I made a branch off master on my forked copy called "low_ranking_parses", and pushed it into there.
Hi Adam,
I don't actually see a branch by that name. Are you able to see your
changes on github (have you pushed that branch)? If so, could you give me a
link to github directly (I don't see the link you refer to in any of your
previous emails)?
Rahul.
On Tue, Oct 21, 2014 at 5:25 PM, metsfan notifications@github.com wrote:
In my previous post, I posted a link to my changes. I made a branch off
master on my local copy called "low_ranking_parses", and pushed it into
there.—
Reply to this email directly or view it on GitHub
#49 (comment).
Try this link
The branch
https://github.com/metsfan/twofishes/tree/low_ranking_parses
My commit
metsfan@6c15281
On Tue, Oct 21, 2014 at 5:40 PM, Rahul Pratap Maddimsetty <
notifications@github.com> wrote:
Hi Adam,
I don't actually see a branch by that name. Are you able to see your
changes on github (have you pushed that branch)? If so, could you give me
a
link to github directly (I don't see the link you refer to in any of your
previous emails)?Rahul.
On Tue, Oct 21, 2014 at 5:25 PM, metsfan notifications@github.com
wrote:In my previous post, I posted a link to my changes. I made a branch off
master on my local copy called "low_ranking_parses", and pushed it into
there.—
Reply to this email directly or view it on GitHub
#49 (comment).—
Reply to this email directly or view it on GitHub
#49 (comment).
Thanks Adam. I'll take a look tomorrow.
Rahul.
On Tue, Oct 21, 2014 at 5:44 PM, metsfan notifications@github.com wrote:
Try this link
The branch
https://github.com/metsfan/twofishes/tree/low_ranking_parsesMy commit
On Tue, Oct 21, 2014 at 5:40 PM, Rahul Pratap Maddimsetty <
notifications@github.com> wrote:Hi Adam,
I don't actually see a branch by that name. Are you able to see your
changes on github (have you pushed that branch)? If so, could you give
me
a
link to github directly (I don't see the link you refer to in any of
your
previous emails)?Rahul.
On Tue, Oct 21, 2014 at 5:25 PM, metsfan notifications@github.com
wrote:In my previous post, I posted a link to my changes. I made a branch
off
master on my local copy called "low_ranking_parses", and pushed it
into
there.—
Reply to this email directly or view it on GitHub
<
https://github.com/foursquare/twofishes/issues/49#issuecomment-60001435>.—
Reply to this email directly or view it on GitHub
#49 (comment).—
Reply to this email directly or view it on GitHub
#49 (comment).
Hi Adam,
I just looked at the change. While I can't say exactly why it isn't working
yet (I'm not sure how you're testing it), I don't think putting this on the
request is the right way to go. When you first described the change, I
thought you were going to add a couple of GeocodeServerConfig params with
default values. Could you do that instead?
Rahul.
On Tue, Oct 21, 2014 at 6:26 PM, Rahul Maddimsetty rahul@foursquare.com
wrote:
Thanks Adam. I'll take a look tomorrow.
Rahul.
On Tue, Oct 21, 2014 at 5:44 PM, metsfan notifications@github.com wrote:
Try this link
The branch
https://github.com/metsfan/twofishes/tree/low_ranking_parsesMy commit
On Tue, Oct 21, 2014 at 5:40 PM, Rahul Pratap Maddimsetty <
notifications@github.com> wrote:Hi Adam,
I don't actually see a branch by that name. Are you able to see your
changes on github (have you pushed that branch)? If so, could you give
me
a
link to github directly (I don't see the link you refer to in any of
your
previous emails)?Rahul.
On Tue, Oct 21, 2014 at 5:25 PM, metsfan notifications@github.com
wrote:In my previous post, I posted a link to my changes. I made a branch
off
master on my local copy called "low_ranking_parses", and pushed it
into
there.—
Reply to this email directly or view it on GitHub
<
https://github.com/foursquare/twofishes/issues/49#issuecomment-60001435>.—
Reply to this email directly or view it on GitHub
<
https://github.com/foursquare/twofishes/issues/49#issuecomment-60003669>.—
Reply to this email directly or view it on GitHub
#49 (comment)
.