google-research/exoplanet-ml

Bazel Build Issues

rathjo14 opened this issue · 25 comments

Following the AstroNet readme as much as possible I have been running into some major problems in the Bazel building phase.

Bazel Version: 0.24.1
TensorFlow Version: 1.14.0
When running: bazel test astronet/... astrowavenet/... light_curve/... tf_util/... third_party/...

ERROR: /private/var/tmp/_bazel_rathjo14/d5d70ed4975039d87f5635d66a43ed87/external/com_google_protobuf/protobuf_deps.bzl:18:9: no such package '': BUILD file not found in any of the following directories.

  • /Users/rathjo14/exoplanet-ml/exoplanet-ml and referenced by '//external:six'
    ERROR: Analysis of target '//light_curve:light_curve_py_pb2' failed; build aborted: Analysis failed
    INFO: Elapsed time: 8.122s
    INFO: 0 processes.
    FAILED: Build did NOT complete successfully (23 packages loaded, 158 targets configured)
    FAILED: Build did NOT complete successfully (23 packages loaded, 158 targets configured)
    Fetching @local_config_cc_toolchains; fetching

Looking into the file mentioned in the error here is what I see (lines 17:23):

if not native.existing_rule("six"):
    http_archive(
        name = "six",
        build_file = "@//:six.BUILD",
        sha256 = "105f8d68616f8248e24bf0e9372ef04d3cc10104f1980f54d57b2ce73a5ad56a",
        urls = ["https://pypi.python.org/packages/source/s/six/six-1.10.0.tar.gz#md5=34eed507548117b2ab523ab14b2f8b55"],
    )

Hello,
Did you find a solution to the issue?

I am facing the same problem. Any solution?

Ok. I am not familiar with Bazel syntax at all, but after a long hustle and long searching and reading, the following solved the problem

Modify the last part of the BUILD file in the light_curve directory:

load("@com_google_protobuf//:protobuf.bzl", "py_proto_library")
py_proto_library(
name = "light_curve_py_pb2",
srcs_version = "PY2AND3",
srcs = glob(["proto/*.proto"]),
deps = [
"@com_google_protobuf//:protobuf_python",
],
)

Also in the WORKSPACE file, I updated the ProtoBuf library at the end of the file

http_archive(
name = "com_google_protobuf",
sha256 = "60d2012e3922e429294d3a4ac31f336016514a91e5a63fd33f35743ccfe1bd7d",
strip_prefix = "protobuf-3.11.0",
urls = ["https://github.com/protocolbuffers/protobuf/archive/v3.11.0.zip"],
)
load("@com_google_protobuf//:protobuf_deps.bzl", "protobuf_deps")

protobuf_deps()

@jalalirs Above solution worked for py_proto_library but now this gives error for proto_library saying no such attribute 'cc_api_version' in 'proto_library' rule
Did anyone faced this?

@jalalirs Above solution worked for py_proto_library but now this gives error for proto_library saying no such attribute 'cc_api_version' in 'proto_library' rule
Did anyone faced this?

Just remove cc_api_version

@jalalirs I did. then it gave numerous other errors.

//astronet/astro_cnn_model:astro_cnn_model_test                          FAILED in 6.2s
  /private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astronet/astro_cnn_model/astro_cnn_model_test/test.log
//astronet/astro_fc_model:astro_fc_model_test                            FAILED in 6.1s
  /private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astronet/astro_fc_model/astro_fc_model_test/test.log
//astronet/astro_model:astro_model_test                                  FAILED in 6.1s
  /private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astronet/astro_model/astro_model_test/test.log
//astronet/ops:dataset_ops_test                                          FAILED in 6.2s
  /private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astronet/ops/dataset_ops_test/test.log
//astronet/ops:input_ops_test                                            FAILED in 2.9s
  /private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astronet/ops/input_ops_test/test.log
//astronet/ops:metrics_test                                              FAILED in 6.1s
  /private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astronet/ops/metrics_test/test.log
//astrowavenet:astrowavenet_model_test                                   FAILED in 6.1s
  /private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astrowavenet/astrowavenet_model_test/test.log
//astrowavenet/data:base_test                                            FAILED in 6.2s
  /private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astrowavenet/data/base_test/test.log
//light_curve:kepler_io_test                                             FAILED in 6.2s
  /private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/light_curve/kepler_io_test/test.log

Executed 9 out of 23 tests: 14 tests pass and 9 fail locally.
There were tests whose specified size is too big. Use the --test_verbose_timeoutINFO: Build completed, 9 tests FAILED, 10 total actions

@jalalirs Above solution worked for py_proto_library but now this gives error for proto_library saying no such attribute 'cc_api_version' in 'proto_library' rule
Did anyone faced this?

I am facing the same problem, what versions of the packages you are using?

@zoe4cs bazel 2.0.0

@zoe4cs Any luck here?

I will fork the project tonight and commit my changes. I don’t remember all the modifications I made but lets see if my version works with you.
Wait for my reply

@jalalirs Ohk sure, thanks :)

@zoe4cs Any luck here?

I guess versions of bazel and TensorFlow causing problem, but I haven't find a solution .

So here is what I did to make it run.

First, I ran it over a tensorflow image from docker hub. I used this tag 2.0.1-gpu-py3-jupyter

https://hub.docker.com/r/tensorflow/tensorflow

In the container, I installed bazel, cloned this repository and did the following modifications

Modify the last part of the BUILD file in the light_curve directory:

load("@com_google_protobuf//:protobuf.bzl", "py_proto_library")
py_proto_library(
name = "light_curve_py_pb2",
srcs_version = "PY2AND3",
srcs = glob(["proto/*.proto"]),
deps = [
"@com_google_protobuf//:protobuf_python",
],
)

Also in the WORKSPACE file, I updated the ProtoBuf library at the end of the file

http_archive(
name = "com_google_protobuf",
sha256 = "60d2012e3922e429294d3a4ac31f336016514a91e5a63fd33f35743ccfe1bd7d",
strip_prefix = "protobuf-3.11.0",
urls = ["https://github.com/protocolbuffers/protobuf/archive/v3.11.0.zip"],
)
load("@com_google_protobuf//:protobuf_deps.bzl", "protobuf_deps")
protobuf_deps()

I ran the test with the following command

bazel test astronet/... astrowavenet/... light_curve/... tf_util/... third_party/... --test_arg=--test_srcdir=/home/exoplanet-ml/exoplanet-ml/

https://pbs.twimg.com/media/EOGoWSOXUAUy0Yj?format=jpg&name=large

@jalalirs They were all version issues. tensorflow and tensorflow_probability.
Workin versions:

tensorboard            1.13.1    
tensorflow             1.13.2    
tensorflow-estimator   1.13.0    
tensorflow-probability 0.6.0 

Still two test cases are failing as below. Don't know why. From logs I can see -

======================================================================
ERROR: testBadLabelIdsRaisesValueError (__main__.BuildDatasetTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/sandbox/darwin-sandbox/91/execroot/__main__/bazel-out/darwin-fastbuild/bin/astronet/ops/dataset_ops_test.runfiles/__main__/astronet/ops/dataset_ops_test.py", line 231, in setUp
    self._file_pattern = os.path.join(FLAGS.test_srcdir, _TEST_TFRECORD_FILE)
  File "/Users/ritsharm/git/google-research/lib/python3.7/site-packages/absl/flags/_flagvalues.py", line 473, in __getattr__
    raise AttributeError(name)
AttributeError: test_srcdir

You need to pass the data source by adding the following parameter to the run command

--test_arg=--test_srcdir=

@jalalirs Thanks a lot for that but still after using

bazel test astronet/... astrowavenet/... light_curve/... tf_util/... third_party/... --test_arg=--test_srcdir=/Users/ritsharm/git/exoplanet-ml/exoplanet-ml/

It gives errors as

usage: astro_cnn_model_test.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b]
                               [-k TESTNAMEPATTERNS]
                               [tests [tests ...]]
astro_cnn_model_test.py: error: unrecognized arguments: --test_srcdir=/Users/ritsharm/git/exoplanet-ml/exoplanet-ml

Probably you need tensorflow 2

@jalalirs But with TensorFlow 2 lots of other things are breaking :(

@jalalirs Tensorflow 2.0 is not supported as this project code uses.

tf.contrib.data.parallel_interleave(
AttributeError: module 'tensorflow' has no attribute 'contrib'

and tf.contrib is deprecated in tf 2.

Can you please check which version of tensorflow are you using?

You are actually right, I am using 1.15
import tensorflow as tf
tf.__version__
'1.15.0'

@jalalirs

I got it correct. It was all version issues.

tensorboard            1.15.0    
tensorflow             1.15.0    
tensorflow-estimator   1.15.1    
tensorflow-probability 0.8.0  

Above versions passes all tests

@jalalirs Did the steps worked for you till the end as mentioned in this

For me it is giving lots of exceptions in Prediction step which is the last step:

# Generate a prediction for a new TCE.
bazel-bin/astronet/predict \
  --model=AstroCNNModel \
  --config_name=local_global \
  --model_dir=${MODEL_DIR} \
  --kepler_data_dir=${KEPLER_DATA_DIR} \
  --kepler_id=11442793 \
  --period=14.44912 \
  --t0=2.2 \
  --duration=0.11267 \
  --output_image_file="${HOME}/astronet/kepler-90i.png"

is there any code change?

@ritwik12 no I just ran the test command. After that I started using some of the modules directly. I am working on it intermittently, so I didn’t do any training yet.

I am an amateur in the astronomy field and just starting to get my hand dirty with its data. Yet, for this specific project, I am planning to skip all the bazel thing and build the code using direct python calls.

Ohk got it. Thanks a lot :) @jalalirs

leaving a modified version here for people who happen to stumble upon this thread. I've linked the docker image at the top of the readme that I used to get it to work with my AMD Vega 56 and ROCm. Make sure to also follow the ROCm docker install guide If you have issues with rocm-dkms installing, switch to and older kernel version. I was running 5.8 (on Ubuntu 20 LTS which is the recommended distro) and installing 5.6 fixed the issue.