harthur/brain

Stream training ability

nickpoorman opened this issue · 2 comments

I'm thinking about digging into the code and making the training done with streams and event emitters. That way I can stream the data in from my database and train as the data is flowing in.

Can you think of any problems I might run into? I understand that the data will probably need to be streamed in multiple times to do the training so maybe some caching will be necessary.

My worry is if I'm training a dataset of 20+ million records, I will quickly run out of memory trying to load it all into RAM.

From a quick glance it seems like the only real issue is going to be the formatData function. I'm thinking it will need to be called on each record individually as it's read from the stream. So if the input is going to be as a hash then there will need to be an extra iteration over the stream to generate inputLookup and outputLookup.

@nickpoorman there's no real way to make this work unfortunately, see #15. You have to have everything in memory at once.

The only option would be to radically change the neural network used. There are different kinds of neural networks, and some would support training incrementally.

Actually, I take that back, I think this could work, and it doesn't necessarily depend on #15.