ramarty/Unique-Location-Extractor

Minor Things

Opened this issue · 0 comments

  • Make sure steps for cleaning tweets before implementing in the algorithm is used in the algorithm itself.

  • ku is general landmark. 3 locations, 2 are clustered.

  • Example: "accodent near githurai exit causing a long tail of traffic" (1) missed correct landmark githurai and chooses "tail." probably should allow for fuzzy match of event words. But if githurai is a general landmark AND fits pattern with high tier prep (we know its the accident word), we should keep - any landmark chosen must be close. Could create indicator. eventword_prepteir1_landmark_dist -- where, if word in that pattern exists, we measure distance to chosen landmark???

  • If general landmark fits a preferred pattern BUT we remove and accept ambiguous pattern - check if landmark from ambiguous pattern is near any location of preferred general landmark? Or just remove - general is probably correct and ambiguous pattern landmark is spurrious

  • For "always keep" landmarks, maybe use first 3 preposition tiers? And can specify what kind of pattern to keep - for example, just prep, or [crash word] [prep] [landmark]