tensorflow/lucid

Research: Poly-Semantic Neurons

colah opened this issue · 5 comments

colah commented

🔬 This is an experiment in doing radically open research. I plan to post all my work on this openly as I do it, tracking it in this issue. I'd love for people to comment, or better yet collaborate! See more.

Please be respectful of the fact that this is unpublished research and that people involved in this are putting themselves in an unusually vulnerable position. Please treat it as you would unpublished work described in a seminar or by a colleague.

Description

Many neurons in GoogLeNet seem to correspond to a single concept, but many do not. We call neurons that correspond to multiple concepts "poly-semantic".

It seems like the trend is that, as one progresses towards later layers, more neurons become poly-semantic and poly-semantic neurons respond to a greater number of things.

image

Questions

This phenomenon begs several questions:

  • Why do poly-semantic neurons occur?
    • (Should we even be surprised? Maybe we should be more surprised that there seem to be neurons responding to a single concept?)
    • Why do they seem to be more common / extreme in later layers?
  • Can we detect poly-semantic neurons?
  • Given a poly-semantic neuron, can we discover all the different things they respond to?
  • Is there a way to decompose networks into a set of orthogonal concepts?
  • Can we regularize networks to not have poly-semantic neurons?

Resources / Related topics

colah commented

Hypotheses: Why do poly-semantic neurons occur?

The Sparse Superposition Hypothesis

You can encode n-dimensional k-sparse vectors in a much lower than n-dimensional space. If you think neural networks are encoding sparse concepts, why not take advantage of this property to encode more concepts?

See Gabriel Goh's Decoding the Thought Vector

The Neuron Packing Hypothesis

This is kind of similar to the sparse superposition hypothesis.

Suppose there are many useful abstractions the model could represent at a layer -- many more abstractions than there are neurons! What's a model to do?

Well, imagine the abstractions sorted by "usefulness":

image

For the most useful neurons, the network might dedicate an entire neuron -- after all, it if it isn't aligned perfectly with an activation function, that would introduce noise.

(In fact, if an abstraction is especially important, it might get multiple neurons, allowing it to encode more nuanced variants.)

But what about less important ones? Well, in a high-dimensional space, you can only have n orthogonal vectors, but you can have exponentially many almost orthogonal vectors. It must be tempting for the network to use this to encode multiple abstractions in a few neurons, even if that causes interference.

(It's possible that all of this even causes a kind of "conservation of neuron importance" law, where they are packed until they are equally important, in some sense, to the greatest extent possible.)

If each progressive layer, building on more powerful abstractions in the previous layer, has a larger number of useful abstractions it could include, we should expect this problem to get worse every layer.

The Skew/Rotation Hypothesis

If you measure correlations between neuron activations, it seems like condition number of that matrix is often very extreme. This suggests that the geometry of that space is very stretched! Unstretching it seems to improve many visualization techniques.

It may be that units are only poly-semantic because we need to unskew or rotate the space...

Under-developed Abstractions Hypothesis

In a recent conversation, Yasaman Bahri (@yasamanb) suggested that maybe we don't have enough data or a hard enough task for the network to determine the right abstractions. Maybe with only a small amount of data "cars and cat faces" seems like a reasonable abstraction? (Note that this comment is paraphrased by Chris and may not be a super accurate interpretation of Yasaman's remark.)

This might happen in later layers because abstractions in earlier layers are simpler (closer to the data) or have fewer degrees of freedom.

Given the size of weight and bias space relative to the input dimensionality (large on average) I think it would be more surprising that single neurons correspond to single features.

Incomplete training is also interesting. Most human minds of adult age have lived long enough to appropriately separate wheels on a car from trash cans. When children first begin to learn to recognize objects in a semi-supervised way, the order of exposure to objects matters. I always imagine class representations as intrinsically abstract (not truly separating dogs from leprechauns, for example) depending on the order of training as a result of the somewhat chaotic nature of the weight and bias configuration space. Again probably due to the high dimensionality.

I've wondered about the implications of the Smale Horseshoe for neural networks. Are neural networks intrinsically chaotic in behavior. The Horseshoe gives a solid criterion for answering the question.

I'm particularly interested in: Can we regularize networks to not have poly-semantic neurons?

One approach I am exploring is using an additional loss function which penalises dispersed activations to encourage more distinct class-specific clusters and branches to emerge during training. I'm writing code to test this idea at the moment and will update here if I find anything interesting.

@jessicamarycooper

I'm particularly interested in: Can we regularize networks to not have poly-semantic neurons?

One approach I am exploring is using an additional loss function which penalises dispersed activations to encourage more distinct class-specific clusters and branches to emerge during training. I'm writing code to test this idea at the moment and will update here if I find anything interesting.

Why should we regard poly-semantic neurons as bad or to be avoided? From another angle, maybe we should be changing our methods to make better use of them.

Can you elaborate on how you define "dispersed activations" and why you expect that penalizing them may be desirable (besides having fewer poly-semantic neurons).