karpathy/llama2.c

Once upon a time, there was a little girl named Lily

GilesBathgate opened this issue · 5 comments

There seems to be a lot of bias in the models because of the repetition of this prefix in the training data.

grep -c "Once upon a time, there was a little girl named Lily" tinystories.txt 
53467

Does anyone know of ways of making qualitative measurements of the training data, i.e diversity of text, (aside from how well it compresses)

Using some simple bash scripts I found:

Count  | Prefix
53,467 | Once upon a time, there was a little girl named Lily
30,640 | Once upon a time, there was a little girl named Sue
20,910 | Once upon a time, there was a little girl named Mia
19,419 | Once upon a time, there was a little girl named Lucy
17,045 | Once upon a time, there was a little girl named Amy
 4,546 | Once upon a time, there was a little girl named Sally
 2,409 | Once upon a time, there was a little girl named Jane
 1,729 | Once upon a time, there was a little girl named Emma
 1,414 | Once upon a time, there was a little girl named Lisa
 1,257 | Once upon a time, there was a little girl named Anna

Likewise Tim is very popular

Count   | Prefix
125,460 | Once upon a time, there was a little boy named Tim
  9,438 | Once upon a time, there was a little boy named Tom
    916 | Once upon a time, there was a little boy named Jack
    543 | Once upon a time, there was a little boy named Mark
    527 | Once upon a time, there was a little boy named Sam
    317 | Once upon a time, there was a little boy named Joe
    294 | Once upon a time, there was a little boy named John
    247 | Once upon a time, there was a little boy named Timmy
    170 | Once upon a time, there was a little boy named Max
    149 | Once upon a time, there was a little boy named Bob

Here is a quick hack to attempt to remove the bias: https://gist.github.com/GilesBathgate/a7a0a18276a2a79836cb6cb44d8656c2

So weird, that ChatGPT has this bias. There are many ways to show it aside from the obvious: https://chat.openai.com/share/f774a1d4-1940-4249-bf1a-642e1ab4ef8f

@karpathy Could this be a suitable starting point for implementing nano RLHF (where the 'Human feedback' is simply some kind of contrastive loss function that penalises repeating prefixes)