Intel vs M2 Macbook Pytorch Model Training Benchmarks

Apple advertises that the M2 GPU cores that you can use for ML. I wanted some basic validation of this claim, but struggled to find any real explorations on this.

Pytorch supports training on M1 / M2 GPU cores through MPS. I really wanted to kick in the tires on this to see if it's that much faster. The goal of this project is to run a few basic Pytorch model training + evaluation samples to compare performance between an early-2023 Macbook with an M2 Max processor and a late-2019 Macbook with an Intel i9 processor. Both machines were almost "maxed out" when purchased from Apple.

All code was provided by Bing AI Search because I didn't care enough to write any of this myself. The code it produced after the first prompt never worked on the first try. But after feeding error messages back to it 2-3 times, it got to the code you see in the repo.

This is about on par with my skills, but Bing AI was much faster.

These examples suggest a 1.7x - 3.5x speed up on the M2 Max over the Intel i9 (2.4 GHz, 8-core).

Env Setup

Uses pytorch 2!

  • Install Poetry (I used v1.4.0)
  • Install Python 3.11
  • Run poetry install

Examples

This is pretty crude, but for timing, we're just using the "time" command. Each example potentially downloads a dataset. I recommend running the example once to get the dataset, and then running it a second time to get the approximate timing values.

Example 1

CNN with SGD. Uses the MNIST dataset.

  • time poetry run python model_training_test/train.py
  • M2: 124.96s
  • Intel: 208.54s
  • Speedup factor: 1.7x

Example 2

A different CNN with SGD, but this one doesn't try to manage moving data between devices and uses the CIFAR10 dataset.

  • time poetry run python model_training_test/train2.py
  • M2: 40.73s
  • Intel: 96.92s
  • Speedup factor: 2.4x

Example 3

Run distributed training for MNIST using "Distributed Data Parallel" (DPP) and two workers. Note: The model itself has a negative loss value and is hot-garbage.

  • time MASTER_ADDR=localhost MASTER_PORT=8083 poetry run python model_training_test/train3.py
  • M2: 28.13s
  • Intel: 98.73s
  • Speedup factor: 3.5x

Example 4

Another DPP example that uses four workers and an even simpler sequential NN on MNIST.

  • time MASTER_ADDR=localhost MASTER_PORT=8083 poetry run python model_training_test/train4.py
  • M2: 30.47s
  • Intel: 108.34s
  • Speedup factor: 3.5x