julie-is-late/TensorFlow-Signal-Processing

regarding solving a non-linear transformation between two signals

wenouyang opened this issue · 3 comments

Hi @jshap70,

Thanks for sharing the tutorial and code. In the practice, there is a problem where the input can be a continuous line, like audio signal, the output is also a continuous line, i.e., another waveform. In the sampling space, I can get 100 samples from waveform in the original continuous space. In other words, the problem can be transformed into finding the mapping function between a numerical sequence and another numerical sequence.

For instance, the input sequence is [-4.1288461e-16 -2.2452528e-15 -1.1717652e-14 -5.8685417e-14 -2.8203791e-13 -1.3006001e-12], the output sequence is [1.2080356e-01 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 ]. In practice, we can have thousands of this kind of input-output pairs for training purposes.

The prediction or transformation task is to find the output sequence for a given input sequence. Do you think your library can be used to solve this kind of problem? Thanks for your advice.

Yeah, whenever time is a dimension in neural networks things end up getting fun - you definitely have to approach the problem differently. The main issue is that non-recurrent networks have no memory, and thus any input into them when they're not training has no effect on the next value. The network doesn't have the "context" to understand the data series.
But I'm guessing you already know this.

Regarding the solutions to this, there's two main ways:

The first is the way I chose to do in this project, and is absolutely not the correct way to do it. You might see me talk about a "scrolling window" in the paper, which means that the network itself is receiving a section of the data in a way that means the network receives 2 dimensions as inputs: the signal amplitude and time. This means that if the network is predicting for the entire time window at once. After this, I used a naive splicing algorithm to frankenstein together the different windows into a single output. However, this has caveats. For one, it severely limits your ability to process the data in real time as you have to have some context to it. Secondly, it is very bad at predicting the edges of the window because it is lacking the required data. Thirdly, if the prediction requires any kind of larger context than the window size you have provided, it will not be able to discern the mapping function.

The actual proper way to approach this problem, and the method I will soon be using when I pick this problem back up, is to use a recurrent neural network. RNNs (and LSTMs) are different from traditional deep learning networks because each cell in the RNN is connected back into itself and has the ability to retain it's previous value in a way that is weighted for both ease of learning as well as strength in the output. This is an excellent overview of the design of an RNN cell. Here are some examples of doing time series prediction using an LSTM: simple, complex.

The reason this project didn't use RNNs is because at the time of creating it I was a lot less knowledgeable about them, and I also lacked the computational power to properly train one. RNNs and LSTMs tend to be exponentially harder to train that CNNs, and at the time I was merely using a cpu server (albeit a 64 core one) to do the training for this project. I do have a Titan XP in the mail though, so I'm looking forward to that 😄.

Let me know if you have any other questions; I'll keep the issue open more as a guide to myself for later than anything.

Hi @jshap70,

Thank you for your detailed response.

The problem I am interested to study is more like a non-linear transformation of numerical-strings to another string. There have some similarities with time series prediction, but not exactly the same. It seems to me they are more similar to language translation, but the input is not word, it is numerical value(floating numbers) instead. Any suggestions will be highly appreciated.

hmm that I'm not sure. I'd need to know more information on the problem to be honest.
Neural nets tend to not be super great on raw strings, which is why a lot of language classification problems will use libraries like word2vec and the like to try to get it into a "usable" format. If you know both your inputs and outputs are going to be neumerical in nature, why not figure out a method to either convert or hash them into a number? I don't really think there's a need to train your net to interpret your input format. Now, if you're saying you need a string from the output, then I wonder if this really is a regression problem. If you really think so, then I would suggest isolating out the output conversion to a string as a secondary problem, whereby the input is the numerical result of the first prediction is then converted into the desired string output.