torch rnn multiple files for training
Remonell opened this issue · 2 comments
I am trying to let torch rnn generate multiple diffrent levels for a 2D game i created. The levels are represented in textformat with 2 diffrent characters. '#' and '+'. # = ground, the character can jump and walk on. + = nothing. I can create infinite levels with a random function. Each level can me infinitely long. Right now the levels are 20 characters in height and 300 characters in length. I am aiming for the rnn to generate levels in those sizes.
My question is: How should I input my training textfiles into torch rnn? Can torch even handle somehow multiple input files for preprocessing/training? Should I combine them to one big file (empty line as seperator)? Should I create one single very long level?
I am very thankful for any kinds of advices. I am very new to machine learning and this is my first project with it.
Kind regards
This question is for Torch google group since it is not an issue you are reporting.
Why do you need neural networks to generate a two symbol sequence in the first place? This is clearly an overkill and something that can be solved with much simpler functions.
I am very new to machine learning
You should work on the above first because neural network is not a script you can just use out of the box without understanding. Read about sequence generation with recurrent neural networks.
How should I input my training textfiles into torch rnn?
You can use a classic training / validation / testing data split setup. Where training file contains your input sequence of symbols and targets will be sequence of symbols shifted ahead by 1. For example: "red car stopped" -> "car stopped near". Take a look at dataload
package SequenceLoader
module that wraps you input data and returns batches of such input and target sequences shifted by 1 symbol.
Can torch even handle somehow multiple input files for preprocessing/training?
You don't need multiple input files. All your training data should be put into a single training.txt which is read into memory and fed to SequenceLoader
as a flat string of symbols. SequenceLoader
will slice it accordingly into minibatches (2d tensors) that will be fed into rnn.
Should I combine them to one big file (empty line as seperator)?
Your training dataset can be represented any way you like, in the end it is a flat ordered array of symbols that is required.
Should I create one single very long level?
It depends on the task you wish to solve.
As I said, your problem can be easily solved with a simpler algorithms unless you wish to learn rnn specifically.
Please use google groups for conceptual big questions or gitter for small ones.
Hey tastyminerals,
thank you for your answers so far and sorry for responding so late. Didn't know about the google group. Looks very helpful and I am going to read through it. Yes, I specifically want to learn rnn so I created this easy project. I did read a lot about rnn in general and now I want to make use of my new knowledge. But there is a whole science behind the tools that you can use it seems.
Going to close this "issue" since it's not an issue.
Kind regards