This is Awesome

Question

This is Awesome

leedrake5 opened this issue 6 years ago · 11 comments

This is fantastic! I spent two days trying to compile TensorFlow on my one to no avail - these precompiled files were a lifesaver!

I wanted to ask about a couple things (for Mac):

Any plans to add AVX512F?
Can MKL support be added too (for all builds)?

Thanks,

Lee

Answer 1 · 2019-01-11T04:36:52.000Z

Hi Lee

I'm glad you found these binaries helpful. As far as plans for AVX512F go, I'm not sure if all the major CPUs support those instructions so I don't know if making them a default would be a good idea. Same goes for MKL.

Let me know what problem you are facing with compiling it locally and we can try to figure it out. If that doesn't work, I can give AVX512F and MKL a shot on my machine.

Answer 2 · 2019-01-11T05:40:18.000Z

I appreciate the response! I am working on a Mac with a 18-core Xenon W processor, so I'm hopeful it would support AV512F and MKL. But, compiling from source has been extremely difficult. It would go through all the downloads, but then fail to create the tmp folder to put the .whl binary in. I tried multiple guides, but this one comes closest to the approach I implemented (I modified the bash script to include mkl, but it failed with and without it all the same).

Error messages were very difficult to interpret, but included some such as:

ERROR: Skipping '//tensorflow/tools/pip_packagec:build_pip_package': no such package 'tensorflow/tools/pip_packagec': BUILD file not found on package path

and

tensorflow ld: symbol(s) not found for architecture x86_64
I tried multiple fixes from message boards and forums, but the first thing to work with some CPU optimization was your tensorflow-build page.

Thanks again for putting together such a fantastic set of binaries.

Answer 3 · 2019-01-11T05:50:39.000Z

Okay, I'll give it a shot locally. What python are you using 2 or 3?

Answer 4 · 2019-01-11T05:56:48.000Z

I use python 3, but can switch if helpful.

Answer 5 · 2019-01-11T05:57:23.000Z

Cool, I'll post the binaries if I'm able to build them

Answer 6 · 2019-01-11T06:34:00.000Z

Have you tried conda install tensorflow-mkl -c defaults

Answer 7 · 2019-01-13T02:30:44.000Z

Did you give this a try? I have been trying to build with MKL but haven't had any success :(

Answer 8 · 2019-01-13T05:40:50.000Z

I was able to compile it! I can send it over if you'd like a copy. The parameters that worked are as follows:

bazel 18.0
TensorFlow r1.12

Next, I made this .sh script, that I moved to the TensorFlow directory:

# Check whether script is executing in a VirtualEnv or Conda environment
if [ -z "$VIRTUAL_ENV" ] && [ -z "$CONDA_PREFIX" ] ; then
	echo "VirtualEnv or Conda env is not activated"
	exit -1
fi

# Set the virtual environment path
if ! [ -z "$VIRTUAL_ENV" ] ; then
  VENV_PATH=$VIRTUAL_ENV
elif ! [ -z "$CONDA_PREFIX" ] ; then
  VENV_PATH=$CONDA_PREFIX
fi

# Set the bin and lib directories
VENV_BIN=$VENV_PATH/bin
VENV_LIB=$VENV_PATH/lib

# bazel tf needs these env vars
export PYTHON_BIN_PATH=$VENV_BIN/python
export PYTHON_LIB_PATH=`ls -d $VENV_LIB/*/ | grep python`

# Set the native architecture optimization flag, which is a default
COPT="--copt=-march=native"

# Determine the available features of your CPU
raw_cpu_flags=`sysctl -a | grep machdep.cpu.features | cut -d ":" -f 2 | tr '[:upper:]' '[:lower:]'`

# Append each of your CPU's features to the list of optimization flags
for cpu_feature in $raw_cpu_flags
do
	case "$cpu_feature" in
		"sse4.1" | "sse4.2" | "ssse3" | "fma" | "cx16" | "popcnt" | "maes" | "mavx512f" | "mavx" | "mavx2" | "mfma" | "mavx512pf" | "mavx512cd" | "mavx512er")
		    COPT+=" --copt=-m$cpu_feature"
		;;
		"avx1.0")
		    COPT+=" --copt=-mavx"
		;;
		*)
			# noop
		;;
	esac
done

# First ensure a clear working directory in case you've run bazel previously
bazel clean --expunge

# Run TensorFlow configuration (accept defaults unless you have a need)
./configure

# Build the TensorFlow pip package
sudo bazel build --config=mkl -c opt $COPT -k --verbose_failures //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

I then run it with two lines:

chmod +x build_tf.sh
./build_tf.sh

Though that said - the .whl I got from this page, and the conda install tensorflow-mkl -c defaults still lead to faster models. I need to tweak the parameters, but I'm close.

Once I've got a shell script that works, I can send over a .whl for folks with xeon processors.

Answer 9 · 2019-01-13T05:44:10.000Z

That would be great. Thanks!

Answer 10 · 2019-01-13T06:39:59.000Z

Download here

Though head-to-head comparison with your MacOS Mojave 3.6.0 build (without AVX512F) is still twice as fast as this file on the same xeon processor (word embedding takes 3 minutes on the file on this page, but 6 minutes with the version I compiled that should have more cpu features). I am not sure why the build on your page is so much better despite missing a key optimization. Even more confusing - the 3 minutes with your build per epoch is with CPU utilization < 50%, yet the 6 minutes for the same epoch on this build uses > 95%. So the one on your page is far better on all counts.

Answer 11 · 2019-01-14T18:03:30.000Z

I've been doing some final tests to evaluate the effects these different builds have with the mnist dataset. Here I am using R to evaluate, but it's just a front end really.

###Use CPU
library("keras"); library("sessioninfo")
use_python("/Users/lee/anaconda3/bin/python")

batch_size <- 128
num_classes <- 10
epochs <- 5

img_rows <- 28
img_cols <- 28

mnist <- dataset_mnist()
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y

x_train <- array_reshape(x_train, c(nrow(x_train), img_rows, img_cols, 1))
x_test <- array_reshape(x_test, c(nrow(x_test), img_rows, img_cols, 1))
input_shape <- c(img_rows, img_cols, 1)

x_train <- x_train / 255
x_test <- x_test / 255

y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)

model <- keras_model_sequential() %>%
layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu',
input_shape = input_shape) %>%
layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(rate = 0.25) %>%
layer_flatten() %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = num_classes, activation = 'softmax')

summary(model)

model %>% compile(
loss = loss_categorical_crossentropy,
optimizer = optimizer_adadelta(),
metrics = c('accuracy')
)

system.time({
    model %>% fit(
    x_train, y_train,
    batch_size = batch_size,
    epochs = epochs,
    validation_split = 0.2
    )
})

Default Conda tensorflow install
user system elapsed
1584.637 5461.211 436.779

tensorflow-build (MacOS Mojave, python 3.6.0) [91% of default time]
user system elapsed
1276.076 5051.262 397.757

AVX512, MKL + tensorflow-build() [77% of default time]
user system elapsed
1537.536 8788.473 335.959

And finally, using plaid-ml with a Radeon Vega 64 card with Metal on the same machine. To do this, install plaid-ml and use this code snippet in R:

library("keras"); library("sessioninfo")
use_python("/Users/lee/anaconda3/bin/python")
use_backend(backend = "plaidml")

If you prefer python, use this code snippet instead:

import os
os.environ["KERAS_BACKEND"] = "plaidml.keras.backend"
import keras

iMac Pro Metal + plaid-ml [10% of default time]
user system elapsed
11.466 12.410 43.466

People aren't kidding when they say graphics cards are the future of deep learning. But, CPU optimization certainly helps. I've learned though that plaid-ml cannot be relied on for natural language processing, as the model loss NaN's out after a couple epochs. For whatever reason, CPU builds are more likely to be successful despite their low speed (thus far).