/physical_interaction_video_prediction_pytorch

Based on Chealsea Finn's et al "Unsupervised Learning for Physical Interaction through Video Prediction"

Primary LanguagePython

Unsupervised Learning for Physical Interaction through Video Prediction

Based on the paper from C. Finn, I. Goodfellow and S. Levine: "Unsupervised Learning for Physical Interaction through Video Prediction", Implemented in Pytorch.

Prepare the data need for training

$ sh download_data.sh push_datafiles.txt # Will download all the data from Google's ftp to data/raw
$ python ./tfrecord_to_dataset.py

Training

$ python ./train.py \
  --data_dir data/processed/push \ # path to the training set.
  --model CDNA \ # the model type to use - DNA, CDNA, or STP
  --output_dir ./weights \ # where to save model checkpoints
  --pretrained_model model \ # path to model to initialize from, random if emtpy
  --sequence_length 10 \ # the number of total frames in a sequence
  --context_frames 2 \ # the number of ground truth frames to pass in at start
  --num_masks 10 \ # the number of transformations and corresponding masks
  --schedsamp_k 900.0 \ # the constant used for scheduled sampling or -1
  --train_val_split 0.95 \ # the percentage of training data for validation
  --batch_size 32 \ # the training batch size
  --learning_rate 0.001 \ # the initial learning rate for the Adam optimizer
  --epochs 10 \ # total training epoch
  --print_interval 10 \ # iterations to output loss
  --device cuda \ # the device used for training
  --use_state \ # whether or not to condition on actions and the initial state