Normalize keyword representation
nderjung opened this issue ยท 3 comments
Hey, the graphs look great but a suggestion to increase readability and interpretation is to normalize the population graphs by % rather than count. This will show representation of the keyword over the community size since the graph over-visualizes the population size. Basically, divide the count of a single term with how many total terms there are: each bar will then be proportional to each other and we will be able to see whether one community, in general, uses one word more than another. This can make the graph easier to digest since not all communities are equal in size on reddit.
Another representation graph can show the "versus" of positive words and negative words, to demonstrate which community uses this more than the other.
Thanks!
Hi, and thanks for the idea. ๐
However, this was a one-off project I did around 7 years ago and did not really touch since then. So my motivation to update it is not very high. ๐
But in case I (or somebody else) would like to re-do such an evaluation in the future, it would make sense to consider your suggestion. ๐
since not all communities are equal in size on reddit
Ah, looking at the graphs, I now remember, the word counts are already normalized by the total number of comments per subreddit (see x-axis label on the graphs).
If this would not be the case, the large subreddits would totally dominate the graphs, which they don't. In the "hadware" graph, for example, quite small subreddits are in the lead. โ๏ธ
Thanks for the insight :) Marking the issue as resolved!