This repo contains the code for the paper Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from Video (https://arxiv.org/abs/1905.11169).
To train run:
PYTHONPATH=. python runners/run_physics.py --task=spring_color --model=PhysicsNet --epochs=500
--batch_size=100 --save_dir=<experiment_folder> --autoencoder_loss=3.0 --base_lr=3e-4 --anneal_lr=true
--color=true --eval_every_n_epochs=10 --print_interval=100 --debug=false --use_ckpt=false
This will automatically run on the test set (evaluation with extrapolation range) in the end of training.
To run only evaluation on a previously trained model use the extra flags --test_mode
and --use_ckpt
:
PYTHONPATH=. python runners/run_physics.py --task=spring_color --model=PhysicsNet --epochs=500
--batch_size=100 --save_dir=<experiment_folder> --autoencoder_loss=3.0 --base_lr=3e-4
--color=true --eval_every_n_epochs=10 --print_interval=100 --debug=false
--use_ckpt=true --test_mode=true
This will use the checkpoint found in <experiment_folder>
. To evaluate a checkpoint from a different folder use --ckpt_dir
:
PYTHONPATH=. python runners/run_physics.py --task=spring_color --model=PhysicsNet --epochs=500
--batch_size=100 --save_dir=<experiment_folder> --autoencoder_loss=3.0 --base_lr=3e-4
--color=true --eval_every_n_epochs=10 --print_interval=100 --debug=false
--use_ckpt=true --test_mode=true --ckpt_dir=<folder_with_checkpoint>
To keep training a model from a checkpoint, simply use the same as above, but with --test_mode=false
. Note that in this case base_lr
will be used as the starting learning rate - there is no global learning rate variable saved in the checkpoint - so if you restart training after annealing was applied, be sure to change the base_lr
accordingly.
Notes on flags, hyperparameters, and general training behavior:
- Using
--anneal_lr=true
will reduce the base learning rate by a factor of 5 after 70% of the epochs are completed. To change this find the corresponding code innn/network/base.py
, in the class methodBaseNet.train()
. - When using
autoencoder_loss
, the encoder and decoder parts of the model will train fairly early in training. The rest of training is mostly improving the physical parameters, but this can take a long time. I recommend training between 500 and 1000 epochs (higher for3bp_color
dataset, lower forspring
datasets).
There are currently 5 tasks implemented in this repo:
bouncing_balls
: (here there are no learnable physical parameters)spring_color
: Two colored balls connected by a spring.spring_color_half
: Same as above, but in the input and prediction range the balls never leave half of the image. They only move to the other half of the image in the extrapolation range of the test set.mnist_spring_color
: Two colored MNIST digits connected by a spring, in a CIFAR background.3bp_color
: Three colored balls connected by gravitational force (3bp
stands for 3-body-problem).
The input, prediction and extrapolation steps are preset for each task, and correspond to the values described in the paper (see 1st paragraph of Section 4.1).
The datasets for the tasks above can be downloaded from this Google Drive. These datasets should be placed in a folder called <repo_root>/data/datasets
in order to be automatically fetched by the code.
For the tasks above, the recommended base_lr
and autoencoder_loss
paramters are:
bouncing_balls
:--base_lr=3e-4 --autoencoder_loss=2.0
spring_color
:--base_lr=6e-4 --autoencoder_loss=3.0
spring_color_half
:--base_lr=6e-4 --autoencoder_loss=3.0
mnist_spring_color
:--base_lr=6e-4 --autoencoder_loss=3.0
3bp_color
:--base_lr=1e-3 --autoencoder_loss=5.0
When tracking training progress from the log.txt
file, a value of eval_recons_loss
below 1.5 indicates that the encoder and decoder have correctly discovered the objects in the scene, and a value of eval_pred_loss
below 3.0 and 30.0 (for balls and mnist datasets, respectively) indicates that the velocity estimator and the physical parameters have been learned correctly. Due to the dependency on initialization, it is possible that even using the hyperparameters above the model gets stuck in a local minimum and never gets below the aforementioned values, by failing to discover all the objects or learning the correct physical parameters/velocity estimator (this is common in unsupervised object discovery methods). I am working on improving convergence stability.
The example%d.jpg
files show random rollouts from the validation/test set. The top row corresponds to the model prediction, middle row to the ground-truth, and bottom row to the reconstructed frames (as used by the autoencoder loss - this can be used to evaluate whether the objects have been discovered even though the dynamics might not have been learned yet).
The templates.jpg
file shows the learned contents (top) and masks (bottom).
This model shows seed dependency when it comes to discovering the objects. For datasets with two objects it works most of the time, whereas for the 3bp_color
dataset it is harder to find a seed that works. It is possible this might be solved by tweaking hyperparameters and network structure, but we have not explored that extensively.