franciscovargas/TeviotDataScienceGame

achieve validation results on zca augmentation

Opened this issue · 2 comments

extrmeely slow even running in parallel on 40 cores on ashbury

ZCA Report

zca augment takes about 40 minutes to intialize once initialized it runs as slow as vgg (Each epoch takes approximately 15 minutes on 70% of the training set, 7 epochs 1 hour, at least 7 hours to train decently (49 epochs ) ). It also starts with remarkably poor accuracies on epoch one this may be because we have too much augmentation already. It will be sad if this model requires 150+ epochs to train since I suspect we are looking approximately at 20 hours of training.

5360/5360 [==============================] - 716s - loss: 14.9033 - acc: 0.1058 - val_loss: 1.4702 - val_acc: 0.1106
Epoch 2/45
5360/5360 [==============================] - 812s - loss: 10.6033 - acc: 0.3431 - val_loss: 9.7908 - val_acc: 0.2201
Epoch 3/45
5360/5360 [==============================] - 866s - loss: 11.8925 - acc: 0.2632 - val_loss: 8.1388 - val_acc: 0.4352
Epoch 4/45
5360/5360 [==============================] - 880s - loss: 9.3527 - acc: 0.4218 - val_loss: 6.3280 - val_acc: 0.4352
Epoch 5/45
5360/5360 [==============================] - 916s - loss: 12.0014 - acc: 0.2562 - val_loss: 10.0433 - val_acc: 0.2201
Epoch 6/45
5360/5360 [==============================] - 772s - loss: 10.6971 - acc: 0.3371 - val_loss: 8.1332 - val_acc: 0.4352
5360/5360 [==============================] - 897s - loss: 10.8774 - acc: 0.3257 - val_loss: 11.0487 - val_acc: 0.2341

Might get better after epoch 1 however it is clear that zca whitened images are more complex to classify. it generalizes in an interesting manner and seems to jump a bit erratically however it is only 7 epochs, it goes up and down acc wise it seems rmsprop is adjusting $\eta$ a bit slowly and it keeps jupmping over the local minima since it may be too big initially, could be promising though (maybe not..). It might be worth trying:

  • zca on its own without the other augmented transforms (priority);
  • zca instead of the zoom in transform
  • GPU-Cluster ...

It may be just that there are harder decision boundaries to learn with this augmentation but overall the hyper surface is much more robust and generalizes better. We have seen that augmentation so far reduces overfitting in conjunction with ridge based regularization and dropout. The zca model may need fine tuning i.e. exploring network topologies, for this we require a GPU cluster since current running times are tractable for tuning a network. Looking at the recurrent 0.4352 this may be a local minima problem.

UPDATE

Results are very poor:

GENERATED
Epoch 1/45
5360/5360 [==============================] - 716s - loss: 14.9033 - acc: 0.1058 - val_loss: 1.4702 - val_acc: 0.1106
Epoch 2/45
5360/5360 [==============================] - 812s - loss: 10.6033 - acc: 0.3431 - val_loss: 9.7908 - val_acc: 0.2201
Epoch 3/45
5360/5360 [==============================] - 866s - loss: 11.8925 - acc: 0.2632 - val_loss: 8.1388 - val_acc: 0.4352
Epoch 4/45
5360/5360 [==============================] - 880s - loss: 9.3527 - acc: 0.4218 - val_loss: 6.3280 - val_acc: 0.4352
Epoch 5/45
5360/5360 [==============================] - 916s - loss: 12.0014 - acc: 0.2562 - val_loss: 10.0433 - val_acc: 0.2201
Epoch 6/45
5360/5360 [==============================] - 772s - loss: 10.6971 - acc: 0.3371 - val_loss: 8.1332 - val_acc: 0.4352
Epoch 7/45
5360/5360 [==============================] - 897s - loss: 10.8774 - acc: 0.3257 - val_loss: 11.0487 - val_acc: 0.2341
Epoch 8/45
5360/5360 [==============================] - 943s - loss: 12.4014 - acc: 0.2315 - val_loss: 4.3901 - val_acc: 0.2341
Epoch 9/45
3936/5360 [=====================>........] - ETA: 190s - loss: 13.3674 - acc: 0.1720