[FEATURE] L0A and L0B processing TODO list
Closed this issue · 0 comments
Is your feature request related to a problem? Please describe.
This issue described improvements to be done to L0A and L0B processing
Describe the solution you'd like
-
Check reader_kwargs
delimiter
is provided and warn otherwise. -
check
df_sanitizer_fun
has only lazy and df arguments -
Remove
read_raw_data_zipped
function and associated code (currently used for GPM campaigns) -
Enable saving integers columns to Parquet files. This requires:
- Definition of a
FillValue
flag for integer columns (using_FillValue
ofL0B_encodings.yml
, except forraw_drop*
) - Coercion of nan to fill values before casting to int type in L0A processing
- Replace fill values with
np.nan
in L0B processing.
- Definition of a
-
In L0B processing, replace
nan_flags
fromL0_data_format.yml
withnp.nan
-
Feature to drop dates based on
issue/station_id.yml
file ... . -
In L0B processing, add
variable_type
(coordinate, count, category, flag, quantity, flux) attribute -
Enable reader development for stations where data are separated in two files. Example with Grenoble: raw.txt e matrix.txt
-
check_metadata_compliance strictly!
-
In L0B processing, check ThiesLPM and OTT_Parsivel raw_drop_number shape: (diameter, velocity) vs (velocity, diameter)
-
Decide whether to support dask.dataframe or use dask.delayed and save separate Parquets (more efficient)
-
If supporting dask dataframe, maybe optimize
row_partition
optmization -
Decide whether to modify L0B to save each netCDF separately and only add the end (optionally) open again all files, concat and write the full file.