[FEATURE] L0A and L0B processing TODO list

Question

Closed this issue 2 years ago · 0 comments

Is your feature request related to a problem? Please describe.
This issue described improvements to be done to L0A and L0B processing

Describe the solution you'd like

Check reader_kwargs delimiter is provided and warn otherwise.
check df_sanitizer_fun has only lazy and df arguments
Remove read_raw_data_zipped function and associated code (currently used for GPM campaigns)
Enable saving integers columns to Parquet files. This requires:
1. Definition of a FillValue flag for integer columns (using _FillValue of L0B_encodings.yml, except for raw_drop*)
2. Coercion of nan to fill values before casting to int type in L0A processing
3. Replace fill values with np.nan in L0B processing.
In L0B processing, replace nan_flags from L0_data_format.yml with np.nan
Feature to drop dates based on issue/station_id.yml file ... .
In L0B processing, add variable_type (coordinate, count, category, flag, quantity, flux) attribute
Enable reader development for stations where data are separated in two files. Example with Grenoble: raw.txt e matrix.txt
check_metadata_compliance strictly!
In L0B processing, check ThiesLPM and OTT_Parsivel raw_drop_number shape: (diameter, velocity) vs (velocity, diameter)
Decide whether to support dask.dataframe or use dask.delayed and save separate Parquets (more efficient)
If supporting dask dataframe, maybe optimize row_partition optmization
Decide whether to modify L0B to save each netCDF separately and only add the end (optionally) open again all files, concat and write the full file.