Bazel Build Issues
rathjo14 opened this issue · 25 comments
Following the AstroNet readme as much as possible I have been running into some major problems in the Bazel building phase.
Bazel Version: 0.24.1
TensorFlow Version: 1.14.0
When running: bazel test astronet/... astrowavenet/... light_curve/... tf_util/... third_party/...
ERROR: /private/var/tmp/_bazel_rathjo14/d5d70ed4975039d87f5635d66a43ed87/external/com_google_protobuf/protobuf_deps.bzl:18:9: no such package '': BUILD file not found in any of the following directories.
- /Users/rathjo14/exoplanet-ml/exoplanet-ml and referenced by '//external:six'
ERROR: Analysis of target '//light_curve:light_curve_py_pb2' failed; build aborted: Analysis failed
INFO: Elapsed time: 8.122s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (23 packages loaded, 158 targets configured)
FAILED: Build did NOT complete successfully (23 packages loaded, 158 targets configured)
Fetching @local_config_cc_toolchains; fetching
Looking into the file mentioned in the error here is what I see (lines 17:23):
if not native.existing_rule("six"):
http_archive(
name = "six",
build_file = "@//:six.BUILD",
sha256 = "105f8d68616f8248e24bf0e9372ef04d3cc10104f1980f54d57b2ce73a5ad56a",
urls = ["https://pypi.python.org/packages/source/s/six/six-1.10.0.tar.gz#md5=34eed507548117b2ab523ab14b2f8b55"],
)
Hello,
Did you find a solution to the issue?
I am facing the same problem. Any solution?
Ok. I am not familiar with Bazel syntax at all, but after a long hustle and long searching and reading, the following solved the problem
Modify the last part of the BUILD file in the light_curve directory:
load("@com_google_protobuf//:protobuf.bzl", "py_proto_library")
py_proto_library(
name = "light_curve_py_pb2",
srcs_version = "PY2AND3",
srcs = glob(["proto/*.proto"]),
deps = [
"@com_google_protobuf//:protobuf_python",
],
)
Also in the WORKSPACE file, I updated the ProtoBuf library at the end of the file
http_archive(
name = "com_google_protobuf",
sha256 = "60d2012e3922e429294d3a4ac31f336016514a91e5a63fd33f35743ccfe1bd7d",
strip_prefix = "protobuf-3.11.0",
urls = ["https://github.com/protocolbuffers/protobuf/archive/v3.11.0.zip"],
)
load("@com_google_protobuf//:protobuf_deps.bzl", "protobuf_deps")
protobuf_deps()
@jalalirs Above solution worked for py_proto_library
but now this gives error for proto_library
saying no such attribute 'cc_api_version' in 'proto_library' rule
Did anyone faced this?
@jalalirs Above solution worked for
py_proto_library
but now this gives error forproto_library
sayingno such attribute 'cc_api_version' in 'proto_library' rule
Did anyone faced this?
Just remove cc_api_version
@jalalirs I did. then it gave numerous other errors.
//astronet/astro_cnn_model:astro_cnn_model_test FAILED in 6.2s
/private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astronet/astro_cnn_model/astro_cnn_model_test/test.log
//astronet/astro_fc_model:astro_fc_model_test FAILED in 6.1s
/private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astronet/astro_fc_model/astro_fc_model_test/test.log
//astronet/astro_model:astro_model_test FAILED in 6.1s
/private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astronet/astro_model/astro_model_test/test.log
//astronet/ops:dataset_ops_test FAILED in 6.2s
/private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astronet/ops/dataset_ops_test/test.log
//astronet/ops:input_ops_test FAILED in 2.9s
/private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astronet/ops/input_ops_test/test.log
//astronet/ops:metrics_test FAILED in 6.1s
/private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astronet/ops/metrics_test/test.log
//astrowavenet:astrowavenet_model_test FAILED in 6.1s
/private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astrowavenet/astrowavenet_model_test/test.log
//astrowavenet/data:base_test FAILED in 6.2s
/private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astrowavenet/data/base_test/test.log
//light_curve:kepler_io_test FAILED in 6.2s
/private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/light_curve/kepler_io_test/test.log
Executed 9 out of 23 tests: 14 tests pass and 9 fail locally.
There were tests whose specified size is too big. Use the --test_verbose_timeoutINFO: Build completed, 9 tests FAILED, 10 total actions
@jalalirs Above solution worked for
py_proto_library
but now this gives error forproto_library
sayingno such attribute 'cc_api_version' in 'proto_library' rule
Did anyone faced this?
I am facing the same problem, what versions of the packages you are using?
I will fork the project tonight and commit my changes. I don’t remember all the modifications I made but lets see if my version works with you.
Wait for my reply
@zoe4cs Any luck here?
I guess versions of bazel and TensorFlow causing problem, but I haven't find a solution .
So here is what I did to make it run.
First, I ran it over a tensorflow image from docker hub. I used this tag 2.0.1-gpu-py3-jupyter
https://hub.docker.com/r/tensorflow/tensorflow
In the container, I installed bazel, cloned this repository and did the following modifications
Modify the last part of the BUILD file in the light_curve directory:
load("@com_google_protobuf//:protobuf.bzl", "py_proto_library")
py_proto_library(
name = "light_curve_py_pb2",
srcs_version = "PY2AND3",
srcs = glob(["proto/*.proto"]),
deps = [
"@com_google_protobuf//:protobuf_python",
],
)
Also in the WORKSPACE file, I updated the ProtoBuf library at the end of the file
http_archive(
name = "com_google_protobuf",
sha256 = "60d2012e3922e429294d3a4ac31f336016514a91e5a63fd33f35743ccfe1bd7d",
strip_prefix = "protobuf-3.11.0",
urls = ["https://github.com/protocolbuffers/protobuf/archive/v3.11.0.zip"],
)
load("@com_google_protobuf//:protobuf_deps.bzl", "protobuf_deps")
protobuf_deps()
I ran the test with the following command
bazel test astronet/... astrowavenet/... light_curve/... tf_util/... third_party/... --test_arg=--test_srcdir=/home/exoplanet-ml/exoplanet-ml/
https://pbs.twimg.com/media/EOGoWSOXUAUy0Yj?format=jpg&name=large
@jalalirs They were all version issues. tensorflow and tensorflow_probability.
Workin versions:
tensorboard 1.13.1
tensorflow 1.13.2
tensorflow-estimator 1.13.0
tensorflow-probability 0.6.0
Still two test cases are failing as below. Don't know why. From logs I can see -
======================================================================
ERROR: testBadLabelIdsRaisesValueError (__main__.BuildDatasetTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/sandbox/darwin-sandbox/91/execroot/__main__/bazel-out/darwin-fastbuild/bin/astronet/ops/dataset_ops_test.runfiles/__main__/astronet/ops/dataset_ops_test.py", line 231, in setUp
self._file_pattern = os.path.join(FLAGS.test_srcdir, _TEST_TFRECORD_FILE)
File "/Users/ritsharm/git/google-research/lib/python3.7/site-packages/absl/flags/_flagvalues.py", line 473, in __getattr__
raise AttributeError(name)
AttributeError: test_srcdir
You need to pass the data source by adding the following parameter to the run command
--test_arg=--test_srcdir=
@jalalirs Thanks a lot for that but still after using
bazel test astronet/... astrowavenet/... light_curve/... tf_util/... third_party/... --test_arg=--test_srcdir=/Users/ritsharm/git/exoplanet-ml/exoplanet-ml/
It gives errors as
usage: astro_cnn_model_test.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b]
[-k TESTNAMEPATTERNS]
[tests [tests ...]]
astro_cnn_model_test.py: error: unrecognized arguments: --test_srcdir=/Users/ritsharm/git/exoplanet-ml/exoplanet-ml
Probably you need tensorflow 2
@jalalirs Tensorflow 2.0 is not supported as this project code uses.
tf.contrib.data.parallel_interleave(
AttributeError: module 'tensorflow' has no attribute 'contrib'
and tf.contrib is deprecated in tf 2.
Can you please check which version of tensorflow are you using?
You are actually right, I am using 1.15
import tensorflow as tf
tf.__version__
'1.15.0'
I got it correct. It was all version issues.
tensorboard 1.15.0
tensorflow 1.15.0
tensorflow-estimator 1.15.1
tensorflow-probability 0.8.0
Above versions passes all tests
@jalalirs Did the steps worked for you till the end as mentioned in this
For me it is giving lots of exceptions in Prediction step which is the last step:
# Generate a prediction for a new TCE.
bazel-bin/astronet/predict \
--model=AstroCNNModel \
--config_name=local_global \
--model_dir=${MODEL_DIR} \
--kepler_data_dir=${KEPLER_DATA_DIR} \
--kepler_id=11442793 \
--period=14.44912 \
--t0=2.2 \
--duration=0.11267 \
--output_image_file="${HOME}/astronet/kepler-90i.png"
is there any code change?
@ritwik12 no I just ran the test command. After that I started using some of the modules directly. I am working on it intermittently, so I didn’t do any training yet.
I am an amateur in the astronomy field and just starting to get my hand dirty with its data. Yet, for this specific project, I am planning to skip all the bazel thing and build the code using direct python calls.
leaving a modified version here for people who happen to stumble upon this thread. I've linked the docker image at the top of the readme that I used to get it to work with my AMD Vega 56 and ROCm. Make sure to also follow the ROCm docker install guide If you have issues with rocm-dkms installing, switch to and older kernel version. I was running 5.8 (on Ubuntu 20 LTS which is the recommended distro) and installing 5.6 fixed the issue.