
Rate versus count for trending rate

sjacks26 opened this issue · 0 comments

Right now, FlockWatch uses raw frequency counts from two time windows to identify trending terms. If there are many more messages in t2 than in t1, FlockWatch will find a lot of trending terms (simply because more messages means more opportunities for a term to appear).
Maybe FlockWatch should use frequency rates (normalized by the number of messages in a time window) rather than raw frequency counts?