CRIMAC-WP4-Machine-learning/CRIMAC-preprocessing

ValueError when generating preprocessed zarr file

Closed this issue · 7 comments

I got one error when trying to create a pre-processed zar file from the following 3 raw files:
2017843-D20170513-T081028
2017843-D20170513-T084938
2017843-D20170513-T092551.

Below the error:
Traceback (most recent call last):
File "/app/CRIMAC_preprocess.py", line 794, in
status = raw_to_grid_multiple(raw_dir,
File "/app/CRIMAC_preprocess.py", line 724, in raw_to_grid_multiple
pq_writer = append_to_parquet(df, pq_filepath, pq_writer)
File "/app/CRIMAC_preprocess.py", line 38, in append_to_parquet
pq_obj.write_table(table=pa_tbl)
File "/usr/local/lib/python3.9/site-packages/pyarrow/parquet.py", line 649, in write_table
raise ValueError(msg)
ValueError: Table schema does not match schema used to create file:
table:
pingTime: timestamp[ns]
mask_depth_upper: double
mask_depth_lower: double
priority: int64
acousticCat: double
proportion: double
ID: string
ChannelID: double
index_level_0: int64
-- schema metadata --
pandas: '{"index_columns": ["index_level_0"], "column_indexes": [{"na' + 1299 vs.
file:
pingTime: timestamp[ns]
mask_depth_upper: double
mask_depth_lower: double
priority: int64
acousticCat: string
proportion: string
ID: string
ChannelID: string
index_level_0: int64
-- schema metadata --
pandas: '{"index_columns": ["index_level_0"], "column_indexes": [{"na' + 1296

Thanks, @albao11 . Is this a new issue or is this fixed already?

I've tried with the said files but I can't reproduce the error. Have you removed all old parquet files from the destination directory?

Yes, I have removed all old parquet files in my output directory. I downloaded the 3 files from storage earlier this morning. What data are you using? Did you get it from storage as well?

Yes, I download the files from the Azure storage too.

This error shouldn't be happening actually. In the code here: https://github.com/CRIMAC-WP4-Machine-learning/CRIMAC-annotationtools/blob/a7a667b3c27398f6d9a1bede65936b045eeee7e0/annotationtools/readers/convert_to_annotations.py#L1430 the columns attributes are set.

Maybe try to update the docker image as well?

Indeed, updating the docker image solved the problem! Thanks.
I got a warning though, not sure if this should be taken into account:

/usr/local/lib/python3.9/site-packages/annotationtools/readers/convert_to_annotations.py:1254: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
if not intr.species_id == -1:

Great, thanks!!!

The warning is a bug and we will address it later. I'll close this issue for now.