torch rnn multiple files for training

Question

torch rnn multiple files for training

Remonell opened this issue 7 years ago · 2 comments

I am trying to let torch rnn generate multiple diffrent levels for a 2D game i created. The levels are represented in textformat with 2 diffrent characters. '#' and '+'. # = ground, the character can jump and walk on. + = nothing. I can create infinite levels with a random function. Each level can me infinitely long. Right now the levels are 20 characters in height and 300 characters in length. I am aiming for the rnn to generate levels in those sizes.

My question is: How should I input my training textfiles into torch rnn? Can torch even handle somehow multiple input files for preprocessing/training? Should I combine them to one big file (empty line as seperator)? Should I create one single very long level?

I am very thankful for any kinds of advices. I am very new to machine learning and this is my first project with it.

Kind regards

Answer 1 · 2018-04-25T05:29:38.000Z

This question is for Torch google group since it is not an issue you are reporting.

Why do you need neural networks to generate a two symbol sequence in the first place? This is clearly an overkill and something that can be solved with much simpler functions.

I am very new to machine learning

You should work on the above first because neural network is not a script you can just use out of the box without understanding. Read about sequence generation with recurrent neural networks.

How should I input my training textfiles into torch rnn?

You can use a classic training / validation / testing data split setup. Where training file contains your input sequence of symbols and targets will be sequence of symbols shifted ahead by 1. For example: "red car stopped" -> "car stopped near". Take a look at dataload package SequenceLoader module that wraps you input data and returns batches of such input and target sequences shifted by 1 symbol.

Can torch even handle somehow multiple input files for preprocessing/training?

You don't need multiple input files. All your training data should be put into a single training.txt which is read into memory and fed to SequenceLoader as a flat string of symbols. SequenceLoader will slice it accordingly into minibatches (2d tensors) that will be fed into rnn.

Should I combine them to one big file (empty line as seperator)?

Your training dataset can be represented any way you like, in the end it is a flat ordered array of symbols that is required.

Should I create one single very long level?

It depends on the task you wish to solve.

As I said, your problem can be easily solved with a simpler algorithms unless you wish to learn rnn specifically.
Please use google groups for conceptual big questions or gitter for small ones.

Answer 2 · 2018-05-02T06:56:07.000Z

Hey tastyminerals,

thank you for your answers so far and sorry for responding so late. Didn't know about the google group. Looks very helpful and I am going to read through it. Yes, I specifically want to learn rnn so I created this easy project. I did read a lot about rnn in general and now I want to make use of my new knowledge. But there is a whole science behind the tools that you can use it seems.

Going to close this "issue" since it's not an issue.

Kind regards