ValueError when generating preprocessed zarr file

Question

ValueError when generating preprocessed zarr file

Closed this issue 4 years ago · 7 comments

I got one error when trying to create a pre-processed zar file from the following 3 raw files:
2017843-D20170513-T081028
2017843-D20170513-T084938
2017843-D20170513-T092551.

Below the error:
Traceback (most recent call last):
File "/app/CRIMAC_preprocess.py", line 794, in
status = raw_to_grid_multiple(raw_dir,
File "/app/CRIMAC_preprocess.py", line 724, in raw_to_grid_multiple
pq_writer = append_to_parquet(df, pq_filepath, pq_writer)
File "/app/CRIMAC_preprocess.py", line 38, in append_to_parquet
pq_obj.write_table(table=pa_tbl)
File "/usr/local/lib/python3.9/site-packages/pyarrow/parquet.py", line 649, in write_table
raise ValueError(msg)
ValueError: Table schema does not match schema used to create file:
table:
pingTime: timestamp[ns]
mask_depth_upper: double
mask_depth_lower: double
priority: int64
acousticCat: double
proportion: double
ID: string
ChannelID: double
index_level_0: int64
-- schema metadata --
pandas: '{"index_columns": ["index_level_0"], "column_indexes": [{"na' + 1299 vs.
file:
pingTime: timestamp[ns]
mask_depth_upper: double
mask_depth_lower: double
priority: int64
acousticCat: string
proportion: string
ID: string
ChannelID: string
index_level_0: int64
-- schema metadata --
pandas: '{"index_columns": ["index_level_0"], "column_indexes": [{"na' + 1296

Answer 1 · 2021-03-01T15:25:02.000Z

Thanks, @albao11 . Is this a new issue or is this fixed already?

Answer 2 · 2021-03-01T15:34:50.000Z

This is a new issue. Can you check what happens with the files I indicated? From: Ibrahim Umar <notifications@github.com> Sent: Monday, March 1, 2021 4:25 PM To: CRIMAC-WP4-Machine-learning/CRIMAC-preprocessing <CRIMAC-preprocessing@noreply.github.com> Cc: Alba Ordonez <albao@nr.no>; Mention <mention@noreply.github.com> Subject: Re: [CRIMAC-WP4-Machine-learning/CRIMAC-preprocessing] ValueError when generating preprocessed zarr file (#5) Thanks, @albao11<https://github.com/albao11> . Is this a new issue or is this fixed already? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#5 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AJ7DUXI62G2NZWJDUTCYWSTTBOWV5ANCNFSM4YMJL4QQ>.

Answer 3 · 2021-03-01T16:04:35.000Z

I've tried with the said files but I can't reproduce the error. Have you removed all old parquet files from the destination directory?

Answer 4 · 2021-03-01T16:17:49.000Z

Yes, I have removed all old parquet files in my output directory. I downloaded the 3 files from storage earlier this morning. What data are you using? Did you get it from storage as well?

Answer 5 · 2021-03-01T16:54:32.000Z

Yes, I download the files from the Azure storage too.

This error shouldn't be happening actually. In the code here: https://github.com/CRIMAC-WP4-Machine-learning/CRIMAC-annotationtools/blob/a7a667b3c27398f6d9a1bede65936b045eeee7e0/annotationtools/readers/convert_to_annotations.py#L1430 the columns attributes are set.

Maybe try to update the docker image as well?

Answer 6 · 2021-03-01T17:43:05.000Z

Indeed, updating the docker image solved the problem! Thanks.
I got a warning though, not sure if this should be taken into account:

/usr/local/lib/python3.9/site-packages/annotationtools/readers/convert_to_annotations.py:1254: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
if not intr.species_id == -1:

Answer 7 · 2021-03-02T07:42:59.000Z

Great, thanks!!!

The warning is a bug and we will address it later. I'll close this issue for now.