Input / output layer
scriptandcompile opened this issue · 12 comments
Another change we might consider to use in order to clean up the code is creating a specific input layer and output layer as specially cased things within the neural network (nn). This also has a nice benefit in that it might make it easier if we try to change in the future to using something besides a Vec for the samples.
It would be nice in the future to be able to send data in for training in a more lazy manner rather than have it all batched up at once and having distinct input/output layers could definitely help this.
It might be useful to create a Builder for the neural network in this case.
let test = NNBuilder::new().input(3).layer(2, Sigmoid::new()).output(1).build();
Or something of that nature. Thoughts?
I see what you mean @addtheice
User needs to pass all the dataset into memory of the application now, which won't work for larger datasets obviously. I'm not still a Rust expert and I'm not sure what is the best way to do this but being able to pipe the input to stdin or a file descriptor would be great.
The problem is, I can feel something is not correct but I'm not sure what is the best way to solve it. Let me know your thoughts.
well, currently you use a vector and then iterate over the contents.
Alternatively we could use some kind of iterator type 'thing' which has a closure which is called each request and then the user can fill in to the closure how they want to fulfil the request.
This would allow them to continue using a vector of samples if they want...oooooor...they could use the iterator / closure to do things to get the sample (iterate through a file, read stdin directly, request from a website, iterate through a collection of file, whatever). This also has the nice bonus of improving the interface by making it more generic, since the only thing you care about for the 'thing' which is passed in (currently the vector) is that you can iterate it and then you can get a sample when you do.
Other issues to consider: online vs offline training (currently we use offline, and I mean this in the neural network sense, not the internet 'online/offline' sense).
Also step wise vs batch training. Currently we are only using step wise training, being able to do batch training would be nice.
Again, lots of options for the future to consider. drop out, different training systems then just gradient decent back prop, etc etc etc. I just think making the step of layers for input/output as well as a builder will allow us to move into a better direction.
One step at a time!
yeah I see what you mean @addtheice. +1 for the iterator thing.
I have already implemented learning rate and will send a PR in a few mins (please review if you can) but +1 for batch processing.
I'll look into how to setup the iterator for this. It should be possible to silently accept a vector or the iterator so that to the user it just 'does the right thing'. Which, I have to admit, I like.
Only tricky bit looks to be the way we handle the sample to matrix conversion thing. We all ready have some low hanging fruit in that we don't need to iterate over the sample copying it into a new vector.
If we move the utility functions into the neural network and make them private, recognising that we will need to remove them eventually, then we should be able to pull the iterator change off. How do you feel about me doing that? It will still have the issue of the iterator being exhausted to build the matrix, but it moves us in the direction we need to test this out in a more iterative fashion instead of all at once. It's a more incremental change, if you want to look at it that way.
I'm personally really excited over that since it would make it possible to actually build some fun examples. Until we have some larger examples, real world use cases we can test and perform against, we can't really go anywhere else with this. Need to know the pinch points of an api to improve the api, need to have actual use cases to play against to work on performance testing, need end-to-end usage to write end-to-end tests. etc etc etc.
I can't think of any drawbacks, it would be easier to talk and comment on a PR.
What do you think about extracting the ideas from this issue and adding new issues for each feature? Just to organize them a bit and have a clear milestone.
That sounds good to me. I've got a lot of work which will have me heads down atm so I haven't made any major headway on this.
I'm a bit concerned about where to start first honestly.
moving the input/output layers out is relatively easy, but it basically complicates things first without much benefit.
creating a builder pattern makes the input/output layer change issue easier, but it's a pretty massive change to make all at once....but the actual internals of a builder pattern change would be pretty significant with the input/output layer break versus without.
changing the sample thing and creating an iterator like collection (I'm thinking of it as an 'sampleset' or 'experimentset' or something else) which would be a lazy sample iterator 'collection' would be easy to setup after the builder and input/output layer breakouts...but the current internals of the 'train' function means we would have to change the way we run it since it tries to exhaust the sample collection in order to build the matrix, which we can't do with a lazy system.
Changing the way train works to avoid this and not create the matrix the way we do...makes me want to look at matrix and switch it out for ndarray, a standard crate which is becoming popular. reuse don't reinvent! as the saying goes.
I'm leaning on the matrix to ndarray change as the first thread to pull in that entire collection just because it's the least tangled of all the rest...but yeah.
I've done a lot of analysis on this, just not much progress yet. I just want to go in relatively self contained pulls which are easy to check as we go, rather than one giant 'plop!'
Thanks for your notes @addtheice
I agree to change the Matrix module and use ndarray but to be honest I don't exactly remember what was the problem that I started to implement my own. I would say it would be even better to use something that can support GPGPU as well. Although, I would vote for keeping the trait on our side and just implement that for different Matrix libs (e.g. ndarray)
+1 for the iterator thing but I do agree that this is a huge change. I believe we can start by changing the Vec<Sample>
and train
method. Also, I think we should change the samples input and add a new parameter to train
to accept the dataset, instead of passing that to NN constructor.
yeah, passing the sample set to the train sounds like a good idea. We can consider other kinds of training later, but it definitely moves us in that direction. I love API's which do two things: the common stuff is easy, the uncommon (and even 'bad') takes effort but is still possible.
I really like your idea of keeping the traits on our side. If done right we can just block off the parts based on a feature flag on the crate.
Example (far in the future), we can make a framework of the entire learning algorithm where you can replace each of the individual components and let those components handle it. This would let us avoid the problem of people wanting to experiment with altered training algorithms and ending up unable to without hand rolling the whole thing.
A good example would be the ability to build neural networks and use genetic algorithms to train them. It shouldn't change how using the neural network works, but the training would need to be done differently. My best guess? the whole 'training' thing needs to be eventually broken out into it's own traits and system which handles all that.
Breaking out differing 'backends' for gpgpu, cpu, cluster, (heck why not fpga as well! mwahahah!) sounds awesome and I think it's where it needs to go to be 'commercial' or 'academic' quality. I see this as a further out issue but one we can not block ourselves out of, otherwise it will never actually be useful in any real sense. So whatever issue we think about we need to keep this in mind, the ability to process on GPGPU at minimum in the future and how any issue would effect this. this is one reason I mentioned ndarray, I think it plays well with some of these gpgpu frameworks if I remember correctly. don't quote me on that, I'll look into it later.
I'm just pie in the sky shooting most of this at the moment. we can keep going back and forth on this and keep discussing the options of what we want and where it should go. Later (I think I have the thursday the third in the afternoon free to do that) I'll break these out into individual issues and cross link them and everything. Let's just keep spit balling them for now.
Cool, I like your idea of the training!
What I'm planning to do is creating another crate called matrixnum
and removing the Matrix module from this repo and move it to matrixnum. Then, when you want to initiate the matrixnum, what you do is passing an enum defining the underlying component (matrixnum::ndarray for instance) but the API and trait would be the same. I'm trying to develop a simple matrix num crate which makes it easy to work with matrices and doing basic stuff which we need for Juggernaut. Why? because I think that matrix module is not a part of NN library and should be maintained separately.
Let me know what you think.
sounds like a way of doing it. I just think we should see if ndarray handles what we need before we branch out further, but then...this is open source. scratch your itch =D Same thing I'm doing.