foursquare/twofishes

Documentation for data/custom/*.txt files

bfontaine opened this issue · 2 comments

Hello,

Would you mind adding something in the README or in the wiki about the files in data/custom/?

For example, can we use regexes in rewrites.txt? How does aliases.txt work? How do we rebuild the index after a modification in these files? Do we have to rebuild the entire index?

Hi,

There is a list of all files in data/custom/ with a short description and examples.

  • aliases.txt adds each alias to the index (and marks it as an english ALT_NAME).
  • boosts.txt boosts given ids as expected.
  • deletes.txt for each name matching a given regex (with \b appended) indexes both the original name and the name without the matched text.

For example, having Township of in deletes.txt, it indexes both Township of Brick and Brick when processing Township of Brick.

  • ignores.txt ignores given geonames ids.
  • moves.txt sets new lat and lng for given geonames ids.
  • name-deletes.txt removes given names from the index.
  • names.txt marks given names as preferred (they are indexed if not present).
  • rewrites.txt for each name matching a given regex indexes the original name and names with the matched text replaced by each word in the comma separated list.

For example, having Mount\b|Mt,Mountain,Mtn in rewrites.txt, it indexes Mount Laurel, Mt Laurel, Mtn Laurel and Mountain Laurel when processing Mount Laurel.

For more details see GeonamesParser.scala and IndexerTest.scala.

I guess you have to rebuild the entire index after changing these files.

Hope it's useful.

Thank you!