Minor Things
Opened this issue · 0 comments
-
Make sure steps for cleaning tweets before implementing in the algorithm is used in the algorithm itself.
-
ku is general landmark. 3 locations, 2 are clustered.
-
Example: "accodent near githurai exit causing a long tail of traffic" (1) missed correct landmark githurai and chooses "tail." probably should allow for fuzzy match of event words. But if githurai is a general landmark AND fits pattern with high tier prep (we know its the accident word), we should keep - any landmark chosen must be close. Could create indicator. eventword_prepteir1_landmark_dist -- where, if word in that pattern exists, we measure distance to chosen landmark???
-
If general landmark fits a preferred pattern BUT we remove and accept ambiguous pattern - check if landmark from ambiguous pattern is near any location of preferred general landmark? Or just remove - general is probably correct and ambiguous pattern landmark is spurrious
-
For "always keep" landmarks, maybe use first 3 preposition tiers? And can specify what kind of pattern to keep - for example, just prep, or [crash word] [prep] [landmark]