Intel vs M2 Macbook Pytorch Model Training Benchmarks
Apple advertises that the M2 GPU cores that you can use for ML. I wanted some basic validation of this claim, but struggled to find any real explorations on this.
Pytorch supports training on M1 / M2 GPU cores through MPS. I really wanted to kick in the tires on this to see if it's that much faster. The goal of this project is to run a few basic Pytorch model training + evaluation samples to compare performance between an early-2023 Macbook with an M2 Max processor and a late-2019 Macbook with an Intel i9 processor. Both machines were almost "maxed out" when purchased from Apple.
All code was provided by Bing AI Search because I didn't care enough to write any of this myself. The code it produced after the first prompt never worked on the first try. But after feeding error messages back to it 2-3 times, it got to the code you see in the repo.
This is about on par with my skills, but Bing AI was much faster.
These examples suggest a 1.7x - 3.5x speed up on the M2 Max over the Intel i9 (2.4 GHz, 8-core).
Env Setup
Uses pytorch 2!
- Install Poetry (I used v1.4.0)
- Install Python 3.11
- Run
poetry install
Examples
This is pretty crude, but for timing, we're just using the "time" command. Each example potentially downloads a dataset. I recommend running the example once to get the dataset, and then running it a second time to get the approximate timing values.
Example 1
CNN with SGD. Uses the MNIST dataset.
time poetry run python model_training_test/train.py
- M2: 124.96s
- Intel: 208.54s
- Speedup factor: 1.7x
Example 2
A different CNN with SGD, but this one doesn't try to manage moving data between devices and uses the CIFAR10 dataset.
time poetry run python model_training_test/train2.py
- M2: 40.73s
- Intel: 96.92s
- Speedup factor: 2.4x
Example 3
Run distributed training for MNIST using "Distributed Data Parallel" (DPP) and two workers. Note: The model itself has a negative loss value and is hot-garbage.
time MASTER_ADDR=localhost MASTER_PORT=8083 poetry run python model_training_test/train3.py
- M2: 28.13s
- Intel: 98.73s
- Speedup factor: 3.5x
Example 4
Another DPP example that uses four workers and an even simpler sequential NN on MNIST.
time MASTER_ADDR=localhost MASTER_PORT=8083 poetry run python model_training_test/train4.py
- M2: 30.47s
- Intel: 108.34s
- Speedup factor: 3.5x