modern-fortran/neural-fortran

extract / set all network parameters via a single 1D real array

jvo203 opened this issue · 2 comments

Hi, it's just a simple feasibility study but I am considering coupling neural-fortran with a PIKAIA genetic algorithms optimizer (https://github.com/jacobwilliams/pikaia). Is there an easy way to extract all the neural network parameters from all the layers in a single function call, in the form of a long 1D real array? And then use another single subroutine to set all the network parameters by loading them from a 1D real array?

Hi @jvo203, neural-fortran doesn't yet provide any such function, but you should be able to write your own easily. See for example how layer % get_output is implemented in src/nf/nf_layer_submodule.f90:

pure module subroutine get_output_1d(self, output)
implicit none
class(layer), intent(in) :: self
real, allocatable, intent(out) :: output(:)
select type(this_layer => self % p)
type is(input1d_layer)
allocate(output, source=this_layer % output)
type is(dense_layer)
allocate(output, source=this_layer % output)
type is(flatten_layer)
allocate(output, source=this_layer % output)
class default
error stop '1-d output can only be read from an input1d, dense, or flatten layer.'
end select
end subroutine get_output_1d
pure module subroutine get_output_3d(self, output)
implicit none
class(layer), intent(in) :: self
real, allocatable, intent(out) :: output(:,:,:)
select type(this_layer => self % p)
type is(input3d_layer)
allocate(output, source=this_layer % output)
type is(conv2d_layer)
allocate(output, source=this_layer % output)
type is(maxpool2d_layer)
allocate(output, source=this_layer % output)
type is(reshape3d_layer)
allocate(output, source=this_layer % output)
class default
error stop '3-d output can only be read from a conv2d, input3d, maxpool2d, or reshape3d layer.'
end select
end subroutine get_output_3d

Because the concrete layer implementations within the high-level layer type are polymorphic, you need to type-guard for each layer type.

And similarly for setting the network parameters, but writing to them instead of reading them (they are public members, so can be written into from client code).

It would be great if you'd like to contribute a layer % get_parameters analogous to get_output, but for network parameters. I think this would be generally useful for interfacing neural-fortran with other libraries or applications, especially before we can store network parameters to a file.

That said, what do you think would be a good file format for storing network state to a file? We could do a general HDF5 file, or a Keras-specific flavor of HDF5 for two-way compatibility.

I see, iterating through all the layers and calling get_parameters(self, output) from a top-level get_parameters() seems to be the best bet. Plus another set of set_parameters() subroutines. And get_number_of_parameters() would also come in handy.

For now I have got the PIKAIA to run a test example using its default real64 precision. I have also "downgraded" PIKAIA to work with real32 (my code as well as neural-fortran work with real32, not real64). So it's time to clone neural-fortran and have a go at it.

Regarding the network storage format, I haven't used Keras so my personal preference would be for a general HDF5 file format. I have used MXNET (mostly from C/C++) as well as Wolfram Mathematica. Although under the hood Mathematica relies on MXNET, Mathematica actually uses the open-source ONNX file format (https://onnx.ai) for exchanging models between frameworks (also see https://github.com/onnx/onnx).

I guess having both a general HDF5 and ONNX would be great. Even just one format (general HDF5) is enough.

On another note, personally I would be interested in re-using the same parameters in some parts of a network (sharing the weights), something that Mathematica does (I haven't used it in Mathematica but would like to do it in the custom FORTRAN code, to be optimised by PIKAIA):

https://reference.wolfram.com/language/ref/NetInsertSharedArrays.html

Does neural-fortran have such a similar functionality? Or would I need to create a separate neural network model, re-use it multiple times, and somehow connect its outputs to the top-level model? Basically I want to pre-process several sets of two-element inputs using an identical (shared) neural network, and then combine its outputs with other inputs, to be processed by the main neural net.