saladtheory/saladtheory.github.io

Suggestions for Ingredient Entropy

Opened this issue · 1 comments

I first thought about opening a PR, but figured it could be discussed first. In particular I wonder if we want to simply replace the old entropy equation, or include a new section to compare different entropy metrics. (Assuming this idea is even accepted).

I think we can look at Shannon Entropy for inspiration here. In particular, it's widely used to analyze the entropy of word salad in large language model training.

Another bonus of Shannon entropy with log-base-2 is that it exactly represents the number of bits needed to encode which ingredients you ate in which order, assuming you know the ingredient mixture ratio already.

For ingredients, we would use the proportion of the mixture as the probability. Compared to the previous formula including only total types and total elements, this new formula captures some intuitive cases of low entropy mixtures.

Example 1

1 peanut, 1 cashew, and 98 almonds in a bowl

Old:
log2(100) * 3 = 19.93
New:
- [ log2(1/100) * 1 + log2(1/100) * 1 + log2(98/100) * 98 ] = 16.14

Example 2

33 peanuts, 33 cashews, and 34 almonds

Old:
log2(100) * 3 = 19.93
New:
- [ log2(33/100) * 33 + log2(33/100) * 33 + log2(34/100) * 34 ] = 158.48

The increased entropy makes sense here, because it measures how unpredictable the mixture is.

Example 3

1 ice cube

1 ice cube new -log2(1/1) * 1 = 0 old log2(1) * 1 = 0

Example 4

8 ice cubes

8 ice cubes new -log2(1/1) * 8 = 0 old log2(8) * 1 = 3

I think this solves @FragileChin's concern in #18. More of a 1-ingredient mixture does not increase entropy.

Other Options

Normalized Entropy

Divide the Shannon Entropy by the total ingredient elements, to get entropy per element, or "average bits per element" of the mixture.

Perplexity

Perplexity is another common metric used in analyzing the statistics of word salad. It's essentially 2 ^ (Shannon Entropy). This represents roughly the number of unique ways (weighted by their probability) you might end up consuming the ingredient elements.

Normalied Perplexity

Like entropy, computing the Perplexity per ingredient element provides a different and interesting insight. This time it represents the statistically weighted variety of ingredients that might be the next ingredient element you eat. This may sound abstract, so lets revisit the nuts example:

Example 1: 1 peanut, 1 cashew, and 98 almonds in a bowl

norm-perplexity = 2 ^ -[ log2(1/100)*1/100 + log2(1/100)*1/100 + log2(98/100)*98/100 ] = 1.12

Because your expectation of the next nut is heavily biased toward a single kind of nut, the perplexity is barely above 1

Example 2: 33 peanuts, 33 cashews, and 34 almonds

norm-perplexity = 2 ^ -[ log2(33/100)*33/100 + log2(33/100)*33/100 + log2(34/100)*34/100 ] = 2.9997

In this case all 3 options are almost equally likely, so the normalized perplexity represents a fair 3 way unpredictability.

Cloving Thoughts and Onions

I'm personally more in favor of normalized entropy metrics as they tell you more directly about the mixture and distribution (intrinsic properties of the food recipe), without being coupled to the quantity of the food. One could always multiply by the number of ingredient elements to achieve quantity dependent absolute entropy if desired.

I would also probably recommend Normalized Perplexity because it moves back into the linear domain and correlates more intuitively to the number of unique ingredients, so people can more intuitively compare salad recipes with this metric.

Finally, I have a proposal for @FragileChin's request for a metric that normalizes in the 0-1 range. You could for example use 1 - 1 / NormalizedPerplexity. For salads with 1 unique ingredient, this becomes 0. For an equal mix of 10 ingredients, it becomes 0.9. For an equal mix of 100 ingredients, it becomes 0.99. A true 1.0 salad would require infinite ingredients, but I see no finite alternative because adding more ingredients should always increase salad entropy.

P.S.

Drawing more parallels to word salad is a content-rich environment, I hope more people are inspired.

I like the Shannon entropy approach because:

  • We are not restricted to a certain probability distribution. While I would intuitively assume that all ingredients in a salad are uniformly distributed through a salad (and you did exactly that), the distribution could be based on a number of elements, or volume, etc. In unusual cases, like Coke with ice (which is, apparently, a salad in its radical form), we can introduce an exponential decay model dependent on factors like container height. As ice floats in the liquid, it is less likely to find ice closer to container bottom. Of course, defining the distribution "at random" we are potentially running into the Bertrand paradox. However, if we resort to a case-by-case modeling, we should be safe.
  • Adding an omnipresent ingredient does not change the entropy. If we dress a salad, it does not become more or less of a salad. Shredded lettuce with Caesar dressing is as much of a low-entropy salad as undressed shredded lettuce. (Here I'm thinking: is there anything that is not a salad but becomes a salad after we add dressing?)

Subsequently, this also holds for perplexity or normalized metrics.