Sum up duplicated periods in `ddlpy.measurements_amount()`
Closed this issue · 0 comments
veenstrajelmer commented
- ddlpy version: 0.4.0
- Python version: 3.11
- Operating System: Windows
Description
When retrieving the measurement amount we get duplicated periods for some stations. For instance for BATH 1993, but also for CADZD, IJMDBTHVN, NIEUWSTZL, DENOVBTN, WESTKPLE. This is because OphalenAantalWaarnemingen
returns multiple amounts for that year, since the WaardeBepalingsmethode changed in that year. These should be summed up so the period index has no duplicated values.
What I Did
import ddlpy
import requests
import pandas as pd
start_date = "1990-01-01"
end_date = "1995-01-01"
print('retrieving DDL locations catalog with ddlpy')
locations = ddlpy.locations()
print('...done')
# select locations on grootheid/groepering/exttypes
bool_grootheid = locations["Grootheid.Code"].isin(["WATHTE"])
bool_groepering_ts = locations["Groepering.Code"].isin(["NVT"])
bool_station_ts = locations.index.isin(["BATH"])
locs_meas_ts = locations.loc[bool_grootheid & bool_groepering_ts & bool_station_ts]
print("ddlpy retrieve amount of measurements")
amount_ts = ddlpy.measurements_amount(location=locs_meas_ts.iloc[0], start_date=start_date, end_date=end_date)
print(amount_ts)
Gives multiple 1993 in the index:
Groeperingsperiode AantalMetingen
0 1990 52554
1 1991 52560
2 1992 52704
3 1993 16704
4 1993 35856
5 1994 52560
6 1995 7
We can look into more detail of the metadata when using ddl api directly
print("direct retrieve amount of measurements")
url = "https://waterwebservices.rijkswaterstaat.nl/ONLINEWAARNEMINGENSERVICES_DBO/OphalenAantalWaarnemingen"
request = {'AquoMetadataLijst': [{'Compartiment': {'Code': 'OW'},
'Eenheid': {'Code': 'cm'},
'Grootheid': {'Code': 'WATHTE'},
'Hoedanigheid': {'Code': 'NAP'},
'Parameter': {'Code': 'NVT'},
'Groepering': {'Code': 'NVT'}}],
'LocatieLijst': [{'X': 584172.789970818, 'Y': 5694908.14241489, 'Code': 'BATH'}],
'Groeperingsperiode': 'Jaar',
'Periode': {'Begindatumtijd': '1990-01-01T00:00:00.000+00:00', 'Einddatumtijd': '1995-01-01T00:00:00.000+00:00'}}
resp = requests.post(url, json=request)
result = resp.json()
if not result['Succesvol']:
raise Exception(result.get('Foutmelding', 'No error returned'))
if result['Succesvol']:
df_list = []
for one in result['AantalWaarnemingenPerPeriodeLijst']:
df = pd.json_normalize(one['AantalMetingenPerPeriodeLijst'])
# combine columns to a period string
df["Groeperingsperiode"] = df["Groeperingsperiode.Jaarnummer"].apply(lambda x: f"{x:04d}")
# select columns from dataframe and append to list
df = df[["Groeperingsperiode","AantalMetingen"]]
df_list.append(df)
# concatenate
amount_all = pd.concat(df_list).sort_values("Groeperingsperiode").reset_index(drop=True)
print(amount_all)
Adding this to ddlpy.measurements_amount()
will solve this issue:
amount_ts = amount_ts.set_index("Groeperingsperiode")
amount_ts = amount_ts.groupby(amount_ts.index).sum()
Also test this for month/daily groeperingsperiode.