Establish the model structure and hyperparameters that are able to approximate any of the elementary functions

Question

Establish the model structure and hyperparameters that are able to approximate any of the elementary functions

Closed this issue 2 years ago · 7 comments

You can see an example for sine committed, but feel free to experiment. In any case, you should confirm in advance that the same structure and hyperparameters (the one in the example or some other) will be able to approximate any of the four elementary functions. These will be the "unknown functions" that your system will learn by observing the outputs of complex expressions of these functions, so we should first confirm that they are learnable.

Answer 1 · 2023-06-08T13:54:13.000Z

In my last commit [0c3ea22] (0c3ea22) I modified sineNN.py to learn functions when x is in a certain range, as our initial task is to learn e.g. sin(x) when x in [-10,10].

Of course, keeping x in [0,1] is a good idea if we want to normalize the values of e^x in range [0,1] using the function y = (e^x-1)/(e-1), but is it actually useful for our task?

Should we continue exploring the learnability of those functions when x in [a,b], or keep it simple and check if the simple functions are learnable when x in [0,1] ?

Answer 2 · 2023-06-08T14:18:46.000Z

Note that the "real" x is not in 0..1, it is in an arbitrary (but fixed in advance) domain that is squashed in 0..1 for the benefit of the NN and then expanded again for the benefit of plotting.

By all means, feel free to experiment on how the domain affects learnability, although I would recommend to keep it as simple as possible for this exercise. I recommend identifying a common domain for all functions, chosen so that the part of the function that is plotted is characteristic enough to be easily distinguishable.

(UNCHEKED just wild speculation) [-pi .. pi] might be a good candidate, as sinc() looks good on domains that are symmetric around zero and sine() will most probably be more learnable for exactly one period. The means only half the plot data for ln(), but I think it will still be identifiable. We will see.

Answer 3 · 2023-06-09T07:32:39.000Z

Looks like the current architecture is enough for learning each function separatelly.

In PyTorch the default sinc is the normalized one : $$\frac{sinc(\pi x)}{\pi x}$$ thus the NN learns the normalized. Should I change it so it learns the unormalized ?

Also, correct me if I am wrong, in this task we aim to have one NN that is able to learn different functions given different data and not one NN that learns to distinguish all four functions when given all of them as training data.

Answer 4 · 2023-06-09T15:56:31.000Z

I think you mean $\frac{\sin(\pi x)}{\pi x}$ [1], which the same as the one computed by numpy [2]

[1] https://pytorch.org/docs/stable/special.html#torch.special.sinc
[2] https://numpy.org/doc/stable/reference/generated/numpy.sinc.html

Answer 5 · 2023-06-09T16:04:46.000Z

Also, correct me if I am wrong, in this task we aim to have one NN that is able to learn different functions given different data and not one NN that learns to distinguish all four functions when given all of them as training data.

That is correct. The aim is to have same network structure fit different functions based on the data it receives. The autodiff "magic" will be that it will appropriately distribute loss between these networks, following how they are connected into a DAG by the semantics of composition, addition, and the poly fn (that we consider as operators of the programming language and not functions).

I tried a small-scale test with sine only, and it looks promising. I will clean and curate and push it later today, along with another issue describing the next step.

Answer 6 · 2023-06-11T20:18:54.000Z

Following up the conversation in #13, I did check if Snake works well with the four functions, even the ones that are not periodic, and it actually fits like a glove in all cases.

I think it works because of the parameter $\alpha$ in snake function i.e. $x + \frac{1}{\alpha} sin^2(\alpha x)$, that is actually trainable in the implementation we use.

I quote the authors: 'We find that for standard tasks such as image classification, setting 0.2 $\leq$ $\alpha$ $\leq$ $\alpha_{max}$ to work very well. We thus set the default value of $\alpha$ to be 0.5. However, for tasks with expected periodicity, larger $\alpha$, usually from 5 to 50 tend to work well.' , where $\alpha_{max}$ is about 0.56045.

Answer 7 · 2023-06-12T03:48:12.000Z

Looks good. Please add these runs into exp00.py, or into a new exp file (whatever you prefer) and close this issue.

If you want this to appear in git history as preceding the work for Issue #13 (conceptually accurate history), then commit into autodiff, and rebase branch 13-... over the new autodiff HEAD.

If you want this to appear in git history as parallel to #13 (accurate timeline, but separating lines of work) then then commit into autodiff and merge the new autodiff into 13-..

If you want git history to give a linear timeline without separating lines of work, just commit at 13-...