Deltares/ddlpy

Sum up duplicated periods in `ddlpy.measurements_amount()`

Closed this issue · 0 comments

  • ddlpy version: 0.4.0
  • Python version: 3.11
  • Operating System: Windows

Description

When retrieving the measurement amount we get duplicated periods for some stations. For instance for BATH 1993, but also for CADZD, IJMDBTHVN, NIEUWSTZL, DENOVBTN, WESTKPLE. This is because OphalenAantalWaarnemingen returns multiple amounts for that year, since the WaardeBepalingsmethode changed in that year. These should be summed up so the period index has no duplicated values.

What I Did

import ddlpy
import requests
import pandas as pd

start_date = "1990-01-01"
end_date = "1995-01-01"

print('retrieving DDL locations catalog with ddlpy')
locations = ddlpy.locations()
print('...done')

# select locations on grootheid/groepering/exttypes
bool_grootheid = locations["Grootheid.Code"].isin(["WATHTE"])
bool_groepering_ts = locations["Groepering.Code"].isin(["NVT"])
bool_station_ts = locations.index.isin(["BATH"])
locs_meas_ts = locations.loc[bool_grootheid & bool_groepering_ts & bool_station_ts]

print("ddlpy retrieve amount of measurements")
amount_ts = ddlpy.measurements_amount(location=locs_meas_ts.iloc[0], start_date=start_date, end_date=end_date)
print(amount_ts)

Gives multiple 1993 in the index:

  Groeperingsperiode  AantalMetingen
0               1990           52554
1               1991           52560
2               1992           52704
3               1993           16704
4               1993           35856
5               1994           52560
6               1995               7

We can look into more detail of the metadata when using ddl api directly

print("direct retrieve amount of measurements")
url = "https://waterwebservices.rijkswaterstaat.nl/ONLINEWAARNEMINGENSERVICES_DBO/OphalenAantalWaarnemingen"

request = {'AquoMetadataLijst': [{'Compartiment': {'Code': 'OW'}, 
                                  'Eenheid': {'Code': 'cm'}, 
                                  'Grootheid': {'Code': 'WATHTE'}, 
                                  'Hoedanigheid': {'Code': 'NAP'}, 
                                  'Parameter': {'Code': 'NVT'}, 
                                  'Groepering': {'Code': 'NVT'}}], 
           'LocatieLijst': [{'X': 584172.789970818, 'Y': 5694908.14241489, 'Code': 'BATH'}], 
           'Groeperingsperiode': 'Jaar', 
           'Periode': {'Begindatumtijd': '1990-01-01T00:00:00.000+00:00', 'Einddatumtijd': '1995-01-01T00:00:00.000+00:00'}}

resp = requests.post(url, json=request)
result = resp.json()
if not result['Succesvol']:
    raise Exception(result.get('Foutmelding', 'No error returned'))

if result['Succesvol']:
    df_list = []
    for one in result['AantalWaarnemingenPerPeriodeLijst']:
        df = pd.json_normalize(one['AantalMetingenPerPeriodeLijst'])
        # combine columns to a period string
        df["Groeperingsperiode"] = df["Groeperingsperiode.Jaarnummer"].apply(lambda x: f"{x:04d}")
        # select columns from dataframe and append to list
        df = df[["Groeperingsperiode","AantalMetingen"]]
        df_list.append(df)
    
    # concatenate
    amount_all = pd.concat(df_list).sort_values("Groeperingsperiode").reset_index(drop=True)
print(amount_all)

Adding this to ddlpy.measurements_amount() will solve this issue:

amount_ts = amount_ts.set_index("Groeperingsperiode")
amount_ts = amount_ts.groupby(amount_ts.index).sum()

Also test this for month/daily groeperingsperiode.