/u/patrickpeng168 has taken the effort to organize a large dataset for tag recruitment. I've decided to see how independent each tag is to one other.
The dataset was taken from this reddit post, and entires are from 2/23/2020 11:07:01 to 3/2/2020 20:50:41.
A quick lesson on stats: Two events are independent if they don't influence the probability of each other, like with two separate coin flips. In contrast, two events are dependent on each other if they somehow skew each other's probability, such as rain on a day if it's cloudy or sunny. In our case, we want to find out if tags are independent or not, which then lets us figure out if the tag system is skewed against certain combinations.
indpeendence_delta_data
are the output files of this work. Values are
calculated as P(A)P(B) - P(A & B)
. The closer each cell is to 0, the more
independent they are. The higher each cell is, the lower the rate of P(A & B)
,
which implies that they show up less frequently together than if they were
independent. This implies that the two tags are thus explicitly weighted such
that they appear less often together. Likewise, if the values are very positive,
this implies that they are weighted to show up more frequently together.
For most of the tag combinations, we see that many have an absolute difference
of anywhere from 0.002
to 0.00003
, which imply that they are very likely to
be independent, and that the differences can be attributed to noise in data
collection. However, there are a few notable cells that are worthwhile to
discuss.
Given the data, we can conclude that there is a bias against guaranteed 4 star
and up tag combinations. For example, we see that certain tag combinations
like Supporter and DPS (Istina), have a difference of +0.05
between its
independent rates. We see that this also occurs with Guard/Aoe (Estelle/Specter),
Melee/Healing (Gummy/Nearl), and Survival/Ranged (Jessica), to name a few.
We also see that some tags have a slight chance to appear more frequently with
each other: notably, Defense/Defender seem to have a difference of -0.03
,
and Melee/Guard to have a -0.01
difference as well.
For posterity, we should mention that the diagonal cells (e.g. AoE/AoE. DPS/DPS, etc.) have nonsensical values, as they're artifacts of the calculations. These values can be ignored because it's impossible to have identical tags show up in a single roll.
However, we refuse to make conclusions about tag independence with regards to the rarer tags, such as Nuker, Top, and Summon, due to the small sample size.
Finally, we discuss the primary source of bias, self-selection bias. We would expect many users to post "interesting" results over all results, which means that people are likely to post guaranteed 4* or 5* tag combinations more so than tag combinations that don't guarantee anything. However, we see that the rarer tag values are already positive. This implies that even with self-selection bias present, we see that the tag combinations are already biased towards not showing up with each other. This means that if anything, it's likely that the tag combinations are even more rare than reported.