karpathy/recurrentjs

text prediction

hardmaru opened this issue · 4 comments

Hi Karparthy

I have trouble understanding a few points in the character prediction demo:

  1. What is the meaning of letter_size?
    letter_size = 5; // size of letter embeddings

My understanding is that the inputs to the network are just vectors of length 50 (or however many unique characters in our dataset) that look like [0, 0, 0, 0, 1, 0, 0, 0 ... ], and the output is similar

  1. to tick the model forward, you used 'rowPluck'
    x = G.rowPluck(model['Wil'], ix);

and I think ix is the integer index that represents the character. I exampled x in inspector and is a value of floats of length letter_size, rather than a large binary vector of length 50

So I have a little bit of a hard time what is going on and currently quite puzzled. Any advice or guidance appreciated!

Thanks

David

Hey David, sorry. recurrentjs was not really meant for production or cleanliness, it's a "are you neural nets expert? ok here's some dump of code you might like" kind of thing.

The characters are encoded in 1ofk for 50, but then there's a linear transformation 50x5 operating over that. When you look at the math this is basically equivalent to plucking a row of the 50x5 matrix (since all elements except one are 0). So effectively letter_size is the dimension of the "embedding space" that each character occupies before it's fed in to the net

I find the code quite neat and okay-readable and I'm able to learn from it after playing around with it

I see, so basically we transform a one-hot vector of size 50 (or whatever) into a vector of 5 floats before feeding it in

I was trying to implement something similar from scratch but got stuck after the output remains to be a bit on the gibberish side after many generations so I wanted take a look at this code for some guidance. Maybe this form of densed representation rather than 1-hot would help improve the performance

Thanks again

In a similar note @karpathy, it seems that you used hot-encoded inputs, is that right?
Why not encode in a single 1 to N integer and leave the hot-encoded stuff for the output?

@karpathy Where is the code for the online demo of the text prediction program?