text prediction

Question

text prediction

hardmaru opened this issue 10 years ago · 4 comments

Hi Karparthy

I have trouble understanding a few points in the character prediction demo:

What is the meaning of letter_size?
letter_size = 5; // size of letter embeddings

My understanding is that the inputs to the network are just vectors of length 50 (or however many unique characters in our dataset) that look like [0, 0, 0, 0, 1, 0, 0, 0 ... ], and the output is similar

to tick the model forward, you used 'rowPluck'
x = G.rowPluck(model['Wil'], ix);

and I think ix is the integer index that represents the character. I exampled x in inspector and is a value of floats of length letter_size, rather than a large binary vector of length 50

So I have a little bit of a hard time what is going on and currently quite puzzled. Any advice or guidance appreciated!

Thanks

David

Answer 1 · 2015-05-30T03:30:34.000Z

Hey David, sorry. recurrentjs was not really meant for production or cleanliness, it's a "are you neural nets expert? ok here's some dump of code you might like" kind of thing.

The characters are encoded in 1ofk for 50, but then there's a linear transformation 50x5 operating over that. When you look at the math this is basically equivalent to plucking a row of the 50x5 matrix (since all elements except one are 0). So effectively letter_size is the dimension of the "embedding space" that each character occupies before it's fed in to the net

Answer 2 · 2015-05-30T04:47:49.000Z

I find the code quite neat and okay-readable and I'm able to learn from it after playing around with it

I see, so basically we transform a one-hot vector of size 50 (or whatever) into a vector of 5 floats before feeding it in

I was trying to implement something similar from scratch but got stuck after the output remains to be a bit on the gibberish side after many generations so I wanted take a look at this code for some guidance. Maybe this form of densed representation rather than 1-hot would help improve the performance

Thanks again

Answer 3 · 2017-03-21T15:23:14.000Z

In a similar note @karpathy, it seems that you used hot-encoded inputs, is that right?
Why not encode in a single 1 to N integer and leave the hot-encoded stuff for the output?

Answer 4 · 2017-08-19T20:15:45.000Z

@karpathy Where is the code for the online demo of the text prediction program?