
The idea is from PyCallChainRules.jl

a small demo package of wrapping a full cennected Dense network of PaddlePaddle in julia, and make it differentiable by ChainRulesCore.rrule.



#install paddlepaddle
using PyCall
run(`$(PyCall.pyprogramname) -m  pip install paddlepaddle==0.0.0 -f`)

using PaddleChainRules.Paddle: paddle, PaddleModuleWrapper, PaddleFCNet
using Zygote

dim_ins = 3
hidden_size = 16
dim_outs = 2
batch_size = 32
num_layers = 2

# now only support full connected Dense network
NN = paddle.nn.Sequential(
        paddle.nn.Linear(dim_ins, hidden_size),
        paddle.nn.Linear(hidden_size, dim_outs)

jlwrap = PaddleModuleWrapper(NN)

# or use a constructor for full connected network
jlwrap = PaddleFCNet(dim_ins, dim_outs, num_layers, hidden_size; activation="sigmoid")

input = rand(Float32, dim_ins, batch_size)

output = jlwrap(input)

target = rand(Float32, dim_outs, batch_size)
loss(m, x, y) = sum(abs2.(m(x) .- y))

# grad of params 
grad, = Zygote.gradient(m->loss(m, input, target), jlwrap)
# grad of input
grad, = Zygote.gradient(x->loss(jlwrap, x, target), input)


#install paddlepaddle-gpu
using PyCall
run(`$(PyCall.pyprogramname) -m  pip install paddlepaddle-gpu`)

using PaddleChainRules.Paddle: paddle, PaddleModuleWrapper, PaddleFCNet
using CUDA
using Zygote
# paddle-gpu will use cuda defualtly if cuda is useable
# or set up the device by hand

dim_ins = 3
hidden_size = 16
dim_outs = 2
batch_size = 32
num_layers = 2

# now only support full connected Dense network
NN = paddle.nn.Sequential(
        paddle.nn.Linear(dim_ins, hidden_size),
        paddle.nn.Linear(hidden_size, dim_outs)

jlwrap = PaddleModuleWrapper(NN)

# or use a constructor for full connected network
jlwrap = PaddleFCNet(dim_ins, dim_outs, num_layers, hidden_size; activation="sigmoid")

input =, dim_ins, batch_size))

output = jlwrap(input)

target =, dim_outs, batch_size))
loss(m, x, y) = sum(abs2.(m(x) .- y))

# grad of params 
grad, = Zygote.gradient(m->loss(m, input, target), jlwrap)
# grad of input
grad, = Zygote.gradient(x->loss(jlwrap, x, target), input)

And there is a demo for neuralPDE.


  • In the demo of neuralPDE, this package is much slower than Flux.jl, need to imporve the speed.
  • Now only the Dense network is supported, more genneral network structure?(rough solution in #2)
  • test code. compare output of forwrad and backward to the result from paddle's api.(done)
  • Some benchmarks:
    • forward and backward.(done)
    • possion equation with NeuralPDE, compared with PyCallChainRules and Flux.