Extending to Image Transformer

Thanks for this very important work.

I'm trying train Image Transformer as one layer flow and have a few questions I hope you can help me with.

In section 4 you describe how to make PixelCNN++ and related models (including Image Transformer) into a single-layer autoregressive flows.

Question 1. Is it correct that the modifications to be done for PixelCNN++ and Image Transformer are the same because both use DMOL?

Question 2. The below comment states that the PixelCNN++ is raw copy. If I am to extend Image Transformer, can I also just make a raw copy?

pixelcnn_flow/pixelflow/networks/autoregressive/pixelcnn_pp.py

Line 7 in 9030f6a

# Raw copy of https://github.com/pclucas14/pixel-cnn-pp

Question 3. It seems to me that the AutoregressiveSubsetFlow2d class does not assume PixelCNN++ and thus may work for ImageTransformer. In principle, if I change the following code to use Image Transformer, should it work?

pixelcnn_flow/experiments/train/exact_pixelcnn_pp.py

Line 63 in 9030f6a

model = AutoregressiveSubsetFlow2d(base_shape = (3,32,32,),

Hi and thanks for your interest!

Q1: Yes, for an Image Transformer with DMOL, the setup is the same. Only the neural architecture that parameterizes the flow will be different.

Q2: Yes, that should be fine if you have an implementation of the neural architecture using in the Image Transformer.

Q3: Yes, by passing in the Image Transformer NN as net everything should still work fine.

You are indeed correct, I managed to get very similar bpd early in training. Will comment tomorrow when training finish.

(image below is bpd loss of Image Transformer trained autoregressively or as single layer normalizing flow)

The training loss curves seem indistinguishable.