Learn computer vision fundamentals with the famous MNIST data
Installation Instructions: README_CUDA.md
- Score: 0.99657 | Rank: ??? /2500 | ./submissions/fastai-resnet18-u100.csv - fastai: resnet18 + fit_one_cycle(50, 5e-2)
- Score: 0.71128 | Rank: 2194/2269 | ./submissions/keras.csv - first attempt
- Score: 0.09671 | Rank: 2487/2500 | ./submissions/random.csv
kaggle competitions download -c digit-recognizer -p ./data/
unzip data/digit-recognizer.zip -d data
node --experimental-modules src/utils/csv2png.js
# kaggle competitions submit -c digit-recognizer -f submissions/submission.csv -m "message"
Converts the CSV data into a filesystem directory tree of png images for better visibility and debugging, as well as for compatibility purposes with fastai ImageDataBunch
NOTE:
- This is a slower (IO bound) method compared to accessing raw numeric CSV data.
- Dropbox crashes when trying to sync 2,016,000 individual files
node --experimental-modules src/random/random.js
wrote: ./submissions/random.csv
Accuracy = 2846/28000 = 10.16%
The random guess method provides a statistical noise baseline, which as expected averages around 10% accuracy
node ./preprocessing/csv2png.js
jupyter lab # 1_fastai-transfer-learning.ipynb
This method utilizes CNN resnet18 with transfer learning and currently produces the best state-of-the-art results, with a top score of 0.99657
Keras is a lower level library than fastai.
CUDA_VISIBLE_DEVICES="" # run with CPU instead of GPU
PYTHONPATH='.' # needed for running local code
time -p python3 src/examples/keras/keras_example_mnist_cnn.py
Test loss: 0.6942943648338318
Test accuracy: 0.8384
Initial benchmark implementation works as a proof of concept. Documentation code claims 99.25% test accuracy after 12 epochs, but running the code locally only produces a score of 83.84%
Timings:
- keras + Adadelta
- 2011 MacbookPro CPU (i7 x 4 @ 2.4GHz) = 89s/epoc = 1.5ms/sample = 1070s
- 2017 Razer CPU (i7-7700HQ x 4 @ 3.8GHz) = 36s/epoc = 605us/sample = 443s ( 2.4x improvement over OSX)
- GeForce GTX 1060 GPU = 5s/epoc = 85us/sample = 66s (6.7x improvement over CPU)
- tf.keras + rmsprop
- 2020 Apple M1 GPU = 18s/epoc = 38us/sample = 213s (3.2x slower than GTX 1060 | 2x faster than Razer i7)
python3 src/examples/tensorflow/main.py
Working examples of Keras syntax: SequentialCNN, FunctionalCNN, ClassCNN, ClassNN
- see extended comments in: src/keras/experiments/convergence_search.py
Best Discovered Hyperparameter Combinations (with simple SequentialCNN)
"optimizer": hp.Discrete([
### learning_rate vs optimizer + scheduler=constant | quickly converges with low learning_rate=0.001
"Adam", # LR=0.1 + CyclicLR (else breaks) || LR=0.01 + constant/plateau2/linear_decay
"Adamax", # LR<=0.1
"Nadam", # LR=0.1 + CyclicLR (else breaks) || LR=0.01 + plateau2 / CyclicLR / linear_decay || LR=0.001 + constant
"RMSprop", # LR=0.001 + constant || LR=0.01 + CyclicLR/plateau2/constant/linear_decay || LR=0.1 + CyclicLR (else breaks)
### learning_rate vs optimizer + scheduler=constant | needs high starting learning_rate=0.1 to quickly converge - may benefit from scheduler
"Adadelta", # Best with LR=1 + plateau2 (quick)
"Adagrad", # Best with LR=0.1 + triangular (slow/best) or plateau2 (quick)
"SGD", # Best with LR=1 + triangular2
### learning_rate vs optimizer + scheduler=constant | needs learning_rate=0.1 | random until 16 epocs, then quickly converges
"Ftrl", # Only works with: LR=0.1 + plateau2/constant OR LR=1 + CyclicLR_triangular
]),
"learning_rate": hp.Discrete([
1.0, # Works with: Adadelta + SGD/triangular2 + Adagrad/CyclicLR + Ftrl/triangular (breaks everything else)
0.1, # Adamax + Adam/Nadam/RMSprop with CyclicLR || Adagrad + triangular/plateau2
0.01, # Adamax + Adam/Nadam/RMSprop with CyclicLR/plateau2/constant/linear_decay
# 0.001, # ALL + constant
]),
"min_lr": hp.Discrete([
0.001, # 1e-03 (0.001) - fastest, least overfitting and most accidental high-scores with enough random attempts
0.0001,
0.00001, # 1e-05 (0.00001) - preferred by SGD
0.000001,
]),
Shortlist of Optimised Schedulers (with simple SequentialCNN)
"optimized_scheduler": {
"Adagrad_triangular": { "learning_rate": 0.1, "optimizer": "Adagrad", "scheduler": "CyclicLR_triangular" },
"Adagrad_plateau": { "learning_rate": 0.1, "optimizer": "Adagrad", "scheduler": "plateau2" },
"Adam_triangular2": { "learning_rate": 0.01, "optimizer": "Adam", "scheduler": "CyclicLR_triangular2" },
"Nadam_plateau": { "learning_rate": 0.01, "optimizer": "Nadam", "scheduler": "plateau_sqrt" },
"Adadelta_plateau": { "learning_rate": 1.0, "optimizer": "Adadelta", "scheduler": "plateau10" },
"SGD_triangular2": { "learning_rate": 1.0, "optimizer": "SGD", "scheduler": "CyclicLR_triangular2" },
"RMSprop_constant": { "learning_rate": 0.001, "optimizer": "RMSprop", "scheduler": "constant" },
}
This was intended as a cheat method, map the csv data back into pngs, then use the Google Vision API to conduct OCR
Doesn't seem to work!
Problems:
- Cost: $1.50 per 1000 requests * 28,000 test images = $42 cost
- Google OCR doesn't seem to like white-on-black single char text images
- Inverting the images (black on white) doesn't improve Google OCR
- API Explorer:
https://cloud.google.com/vision/docs/quickstart?apix_params=%7B%22resource%22%3A%7B%22requests%22%3A%5B%7B%22features%22%3A%5B%7B%22type%22%3A%22DOCUMENT_TEXT_DETECTION%22%7D%5D%2C%22image%22%3A%7B%22source%22%3A%7B%22imageUri%22%3A%22gs%3A%2F%2Fkaggle-digit-recognizer%2Fdata-images%2Ftest%2F1.png%22%7D%7D%7D%5D%7D%7D
- features.type = DOCUMENT_TEXT_DETECTION
- image.source.imageUri = gs://kaggle-digit-recognizer/data-images/test/1.png
node ./preprocessing/csv2png.js
gsutil -m cp -r data/images/ gs://kaggle-digit-recognizer/