FULL_COMPILATION=0 does not build all necessary targets
mowoe opened this issue ยท 7 comments
Hi!
I am trying to build an arm wheel, which is a lot more challenging than i originally thought.
Currently im building the wheel while building a Docker image (see below for the Dockerfile).
This Dockerfile builds the wheel just fine, but the build seems to be missing some things:
$ python3 -c "import tensorflow_decision_forests as tfdf; print('Found TF-DF v' + tfdf.__version__)"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python3.10/site-packages/tensorflow_decision_forests/__init__.py", line 64, in <module>
from tensorflow_decision_forests import keras
File "/usr/local/lib/python3.10/site-packages/tensorflow_decision_forests/keras/__init__.py", line 53, in <module>
from tensorflow_decision_forests.keras import core
File "/usr/local/lib/python3.10/site-packages/tensorflow_decision_forests/keras/core.py", line 59, in <module>
from tensorflow_decision_forests.component.inspector import inspector as inspector_lib
File "/usr/local/lib/python3.10/site-packages/tensorflow_decision_forests/component/inspector/inspector.py", line 64, in <module>
from tensorflow_decision_forests.component import py_tree
File "/usr/local/lib/python3.10/site-packages/tensorflow_decision_forests/component/py_tree/__init__.py", line 20, in <module>
from tensorflow_decision_forests.component.py_tree import condition
File "/usr/local/lib/python3.10/site-packages/tensorflow_decision_forests/component/py_tree/condition.py", line 26, in <module>
from tensorflow_decision_forests.component.py_tree import dataspec as dataspec_lib
File "/usr/local/lib/python3.10/site-packages/tensorflow_decision_forests/component/py_tree/dataspec.py", line 24, in <module>
from yggdrasil_decision_forests.dataset import data_spec_pb2
ModuleNotFoundError: No module named 'yggdrasil_decision_forests.dataset'
I suspect build rules defined in test-bazel.sh#167 are not correct, but i am not experienced enough with bazel to find out the correct build rules.
FROM python:3.10-buster
RUN apt update
RUN apt install -y git
# Installing JAX
RUN git clone -b jaxlib-v0.4.10 https://github.com/google/jax
WORKDIR /jax
RUN pip install numpy wheel
RUN git clone https://github.com/openxla/xla.git /xla
WORKDIR /jax
RUN python build/build.py --bazel_options=--override_repository=xla=/xla
RUN pip install dist/*.whl
RUN pip install -e .
# Installing bazel
WORKDIR /
RUN wget https://github.com/bazelbuild/bazel/releases/download/6.2.0/bazel-6.2.0-linux-arm64
RUN chmod +x /bazel-6.2.0-linux-arm64
RUN ln -s /bazel-6.2.0-linux-arm64 /usr/bin/bazel
WORKDIR /
# Use fork while PR #176 is not merged yet
RUN git clone https://github.com/mowoe/decision-forests
WORKDIR /decision-forests
COPY ./test_bazel.patch /decision-forests/test_bazel.patch
COPY ./build_pip_package.patch /decision-forests/build_pip_package.patch
COPY ./patched_gcc /patched_gcc
RUN chmod +x /patched_gcc
RUN rm /usr/bin/gcc
RUN ln -s /patched_gcc /usr/bin/gcc
RUN git apply build_pip_package.patch
RUN git apply test_bazel.patch
ENV TF_VERSION=2.13.0-rc0
ENV PY_VERSION=3.10
ENV FULL_COMPILATION=0
ENV TF_NEED_CUDA=0
RUN ./tools/test_bazel.sh
RUN apt update && apt install -y patchelf
RUN ./tools/build_pip_package.sh python3.10
Due to multiple issues, some pretty ugly patching is neccessary:
patched_gcc
(avx not availible in arm docker):
#!/bin/bash
args=()
for arg in "$@"; do
if [[ $arg != "-mavx" ]]; then
args+=("$arg")
fi
done
gcc-8 "${args[@]}"
test_bazel.patch
(Remove all cuda targets from tensorflow):
diff --git a/tools/test_bazel.sh b/tools/test_bazel.sh
index 98af492..ae961bf 100755
--- a/tools/test_bazel.sh
+++ b/tools/test_bazel.sh
@@ -68,6 +68,15 @@ sed -i'.bak' -e "s/sha256 = \"${prev_shasum}\",//" WORKSPACE
# Get build configuration for chosen version.
TENSORFLOW_BAZELRC="tensorflow_bazelrc"
curl https://raw.githubusercontent.com/tensorflow/tensorflow/${commit_sha}/.bazelrc -o ${TENSORFLOW_BAZELRC}
+tempfile=$(mktemp)
+
+while read line; do
+ if [[ $line != *"cuda"* ]]; then
+ echo "$line" >> "$tempfile"
+ fi
+done < "$TENSORFLOW_BAZELRC"
+
+mv "$tempfile" "$TENSORFLOW_BAZELRC"
# Force a compiler
# export CC=gcc-8
build_pip_package.patch
:
diff --git a/tools/build_pip_package.sh b/tools/build_pip_package.sh
index dbef740..7e92567 100755
--- a/tools/build_pip_package.sh
+++ b/tools/build_pip_package.sh
@@ -154,35 +154,34 @@ function test_package() {
if is_macos; then
PACKAGEPATH="dist/tensorflow_decision_forests-*-cp${PACKAGE}-cp${PACKAGE}*-*.whl"
else
- PACKAGEPATH="dist/tensorflow_decision_forests-*-cp${PACKAGE}-cp${PACKAGE}*.manylinux2014_x86_64.whl"
+ PACKAGEPATH="dist/tensorflow_decision_forests-*-cp${PACKAGE}-cp${PACKAGE}*manylinux_2_28_aarch64.whl"
fi
${PIP} install ${PACKAGEPATH}
-
${PIP} list
${PIP} show tensorflow_decision_forests -f
@@ -199,9 +198,9 @@ function e2e_native() {
PACKAGEPATH="dist/tensorflow_decision_forests-*-cp${PACKAGE}-cp${PACKAGE}*-*.whl"
else
check_auditwheel ${PYTHON}
- PACKAGEPATH="dist/tensorflow_decision_forests-*-cp${PACKAGE}-cp${PACKAGE}*-linux_x86_64.whl"
+ PACKAGEPATH="dist/tensorflow_decision_forests-*-cp${PACKAGE}-cp${PACKAGE}*-linux_aarch64.whl"
TF_DYNAMIC_FILENAME="libtensorflow_framework.so.2"
- ${PYTHON} -m auditwheel repair --plat manylinux2014_x86_64 -w dist --exclude ${TF_DYNAMIC_FILENAME} ${PACKAGEPATH}
+ ${PYTHON} -m auditwheel repair --plat manylinux_2_28_aarch64 -w dist --exclude ${TF_DYNAMIC_FILENAME} ${PACKAGEPATH}
fi
test_package ${PYTHON} ${PACKAGE}
Sidenote: Setting FULL_COMPILATION=1
causes the build to fail because of some unrelated tensorflow issues and shouldnt be necessary to build the library. As far as I can see it is the same issue described here. In any case, the error is in upstream tensorflow.
Hi, just chiming in briefly. I wasn't fully able to debug this problem, but I'll share what I know.
I don't believe test-bazel.sh#167 is the issue - the targets are ok. There seems to be an issue with the package structure. Can you upload the wheel you produced somewhere so I can inspect it?
Hi @rstz,
thanks for your reply.
Here are the wheels i built (github only supports zips):
tensorflow_decision_forests-1.3.0-cp310-cp310-linux_aarch64.whl.zip
tensorflow_decision_forests-1.3.0-cp310-cp310-manylinux_2_28_aarch64.whl.zip
Thanks! It looks like build_pip_package.sh is either not compiling or not properly copying over the ydf. The relevant lines are L119-L127 of build_pip_package.sh. Could you please check if the necessary files are there before copying (in particular the python files like data_spec_pb2.py
in bazel-bin/external/ydf/yggdrasil_decision_forests/dataset/
)
Thank you so much for the hint @rstz ! The actual problem turned out to be that my minimal debian image did not include rsync (as expected) and the build script did not fail but rather just didnt execute the commands. Now that i added rsync, i was able to build the arm wheel successfully:
tensorflow_decision_forests-1.3.0-cp310-cp310-manylinux_2_28_aarch64.whl.zip
Sorry for wasting your time!
yay! ๐ฅณ
If you're building this wheel for a specific project that you can share (either publicly or via email to me), feel free to do so, we're happy to know what people are working on with TF-DF ๐
@rstz actually i built the arm wheel for a specific project, but it might be a bit underwhelming: ๐
I use tf-df for my numer.ai model. Currently i have to run an ipython notebook every other day which is a bit tedious, but numerai supports calling a webhook when submissions are due. Currently only aws SageMaker FaaS stuff is documented, which i did not want to use for a number of reasons.
Instead i wanted to use fission, which is an open-source k8s FaaS framework.
As i am doing all of this for fun and no profit, i didnt feel like spending any money on managed k8s like gke or eks. Oracle Cloud Infrastructure supports a three-node managed k8s cluster in its forever-free tier, which i have used for other projects. This has one major drawback though: The compute instances are ARM instances. This is why i needed an arm wheel.
TL;DR i automated a ~2min task by spending hours trying to build the arm wheel for a made-up problem ๐
i automated a ~2min task by spending hours trying to build the arm wheel for a made-up problem ๐
I love it ๐
Thank you for reporting back, sharing the wheel you build and good luck in the competition!