Once upon a time, there was a little girl named Lily
GilesBathgate opened this issue · 5 comments
There seems to be a lot of bias in the models because of the repetition of this prefix in the training data.
grep -c "Once upon a time, there was a little girl named Lily" tinystories.txt
53467
Does anyone know of ways of making qualitative measurements of the training data, i.e diversity of text, (aside from how well it compresses)
Using some simple bash scripts I found:
Count | Prefix
53,467 | Once upon a time, there was a little girl named Lily
30,640 | Once upon a time, there was a little girl named Sue
20,910 | Once upon a time, there was a little girl named Mia
19,419 | Once upon a time, there was a little girl named Lucy
17,045 | Once upon a time, there was a little girl named Amy
4,546 | Once upon a time, there was a little girl named Sally
2,409 | Once upon a time, there was a little girl named Jane
1,729 | Once upon a time, there was a little girl named Emma
1,414 | Once upon a time, there was a little girl named Lisa
1,257 | Once upon a time, there was a little girl named Anna
Likewise Tim is very popular
Count | Prefix
125,460 | Once upon a time, there was a little boy named Tim
9,438 | Once upon a time, there was a little boy named Tom
916 | Once upon a time, there was a little boy named Jack
543 | Once upon a time, there was a little boy named Mark
527 | Once upon a time, there was a little boy named Sam
317 | Once upon a time, there was a little boy named Joe
294 | Once upon a time, there was a little boy named John
247 | Once upon a time, there was a little boy named Timmy
170 | Once upon a time, there was a little boy named Max
149 | Once upon a time, there was a little boy named Bob
Here is a quick hack to attempt to remove the bias: https://gist.github.com/GilesBathgate/a7a0a18276a2a79836cb6cb44d8656c2
So weird, that ChatGPT has this bias. There are many ways to show it aside from the obvious: https://chat.openai.com/share/f774a1d4-1940-4249-bf1a-642e1ab4ef8f
@karpathy Could this be a suitable starting point for implementing nano RLHF (where the 'Human feedback' is simply some kind of contrastive loss function that penalises repeating prefixes)