ValueError: could not broadcast input array from shape (4,43) into shape (4,39)
Opened this issue · 7 comments
Hi, I'm trying to create training, validation and test files in "Process Kepler Data" section. Unit test pass, bazel build done without any problems and all tensorflow libraries are correct.
# Preprocess light curves into sharded TFRecord files using 5 worker processes.
bazel-bin/astronet/data/generate_input_records \
--input_tce_csv_file=${TCE_CSV_FILE} \
--kepler_data_dir=${KEPLER_DATA_DIR} \
--output_dir=${TFRECORD_DIR} \
--num_worker_processes=5
The process seems to start without any issues and it starts to create files, but after a while i get this error:
Traceback (most recent call last):
File "/home/s.fiscale/anaconda3/envs/astronet_env/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/s.fiscale/conda/exoplanet-ml/exoplanet-ml/bazel-bin/astronet/data/generate_input_records.runfiles/main/astronet/data/generate_input_records.py", line 164, in _process_file_shard
example = _process_tce(tce)
File "/home/s.fiscale/conda/exoplanet-ml/exoplanet-ml/bazel-bin/astronet/data/generate_input_records.runfiles/main/astronet/data/generate_input_records.py", line 144, in _process_tce
time, flux = preprocess.process_light_curve(all_time, all_flux)
File "/home/s.fiscale/conda/exoplanet-ml/exoplanet-ml/bazel-bin/astronet/data/generate_input_records.runfiles/main/astronet/data/preprocess.py", line 72, in process_light_curve
spline = kepler_spline.fit_kepler_spline(all_time, all_flux, verbose=False)[0]
File "/home/s.fiscale/conda/exoplanet-ml/exoplanet-ml/bazel-bin/astronet/data/generate_input_records.runfiles/main/third_party/kepler_spline/kepler_spline.py", line 321, in fit_kepler_spline
verbose=verbose)
File "/home/s.fiscale/conda/exoplanet-ml/exoplanet-ml/bazel-bin/astronet/data/generate_input_records.runfiles/main/third_party/kepler_spline/kepler_spline.py", line 216, in choose_kepler_spline
time, flux, bkspace=bkspace, maxiter=maxiter)
File "/home/s.fiscale/conda/exoplanet-ml/exoplanet-ml/bazel-bin/astronet/data/generate_input_records.runfiles/main/third_party/kepler_spline/kepler_spline.py", line 104, in kepler_spline
curve = bspline.iterfit(time[mask], flux[mask], bkspace=bkspace)[0]
File "/home/s.fiscale/anaconda3/envs/astronet_env/lib/python3.7/site-packages/pydl/pydlutils/bspline.py", line 639, in iterfit
x2=x2work)
File "/home/s.fiscale/anaconda3/envs/astronet_env/lib/python3.7/site-packages/pydl/pydlutils/bspline.py", line 189, in fit
errb = cholesky_band(alpha, mininf=min_influence)
File "/home/s.fiscale/anaconda3/envs/astronet_env/lib/python3.7/site-packages/pydl/pydlutils/bspline.py", line 491, in cholesky_band
L[:, 0:n] = lower
ValueError: could not broadcast input array from shape (4,43) into shape (4,39)
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/s.fiscale/conda/exoplanet-ml/exoplanet-ml/bazel-bin/astronet/data/generate_input_records.runfiles/main/astronet/data/generate_input_records.py", line 256, in
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/home/s.fiscale/anaconda3/envs/astronet_env/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/s.fiscale/anaconda3/envs/astronet_env/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/s.fiscale/anaconda3/envs/astronet_env/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/home/s.fiscale/conda/exoplanet-ml/exoplanet-ml/bazel-bin/astronet/data/generate_input_records.runfiles/main/astronet/data/generate_input_records.py", line 248, in main
async_result.get()
File "/home/s.fiscale/anaconda3/envs/astronet_env/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
** ValueError: could not broadcast input array from shape (4,43) into shape (4,39)**
I have no idea about how to fix this error. Can someone help me here?
Hey mate! Did you by any chance get this issue resolved?
Hi, after spending some time on this issue I figured out that this happens due to iterations of spline fit (set to 5 number of iterations by default). Since at each iteration the algorithm removes some outlier data points, it sometimes happens that it removes too many points and starts to violate some conditions of the spline fit. So by restricting the number of iterations or exiting the iteration procedure if there is a violation helps getting rid of the error. This should not affect the performance so much because after the first fit we already have a relatively good spline and the iterations are there simply to try and see if we can improve it. I hope this will help the others facing the same issue and will be happy to hear if anyone has feedback!
Hi, after spending some time on this issue I figured out that this happens due to iterations of spline fit (set to 5 number of iterations by default). Since at each iteration the algorithm removes some outlier data points, it sometimes happens that it removes too many points and starts to violate some conditions of the spline fit. So by restricting the number of iterations or exiting the iteration procedure if there is a violation helps getting rid of the error. This should not affect the performance so much because after the first fit we already have a relatively good spline and the iterations are there simply to try and see if we can improve it. I hope this will help the others facing the same issue and will be happy to hear if anyone has feedback!
Can you tell me what's exactly i have to do with this? I mean, what iterations? Which parts of code that i have to change?
I presume the iterations passed in the above - set that to 1 or make a handler as suggested above.
Is it around the wrong way?