ltelab/disdrodb

[FEATURE] L0A and L0B processing TODO list

Closed this issue · 0 comments

Is your feature request related to a problem? Please describe.
This issue described improvements to be done to L0A and L0B processing

Describe the solution you'd like

  • Check reader_kwargs delimiter is provided and warn otherwise.

  • check df_sanitizer_fun has only lazy and df arguments

  • Remove read_raw_data_zipped function and associated code (currently used for GPM campaigns)

  • Enable saving integers columns to Parquet files. This requires:

    1. Definition of a FillValue flag for integer columns (using _FillValue of L0B_encodings.yml, except for raw_drop*)
    2. Coercion of nan to fill values before casting to int type in L0A processing
    3. Replace fill values with np.nan in L0B processing.
  • In L0B processing, replace nan_flags from L0_data_format.yml with np.nan

  • Feature to drop dates based on issue/station_id.yml file ... .

  • In L0B processing, add variable_type (coordinate, count, category, flag, quantity, flux) attribute

  • Enable reader development for stations where data are separated in two files. Example with Grenoble: raw.txt e matrix.txt

  • check_metadata_compliance strictly!

  • In L0B processing, check ThiesLPM and OTT_Parsivel raw_drop_number shape: (diameter, velocity) vs (velocity, diameter)

  • Decide whether to support dask.dataframe or use dask.delayed and save separate Parquets (more efficient)

  • If supporting dask dataframe, maybe optimize row_partition optmization

  • Decide whether to modify L0B to save each netCDF separately and only add the end (optionally) open again all files, concat and write the full file.