Problem with merging different HEC-DSS file
Closed this issue · 5 comments
Hello,
I am trying to append multiple time series of HEC-DSS data into one file, but there were always empty raws between the data set. I would be pleased if anyone could give me some tips on how to solve this issue using some pydsstools script. The attached picture is taken from the merged HEC-DSS data using DSSVue GUI interface.
Or is there any method to append HEC-DSS files into a single one?
HEC-DSS appends by default, similar to a database. If you are writing to a time step that doesn't exist, the time series gets extended to cover the entire record. The only way to delete data is to manually delete a record and squeeze the dss file.
There are regular (i.e., known frequency) between timesteps and irregular timeseries that have date/value relationships.
I would inspect your data and try to write out an irregular time series first. Then you might be able to make sense of the missing data.
Thank you for responding so quickly.
There are multiple problems we are facing. In the first place, we have this rewriting, not appending as we want, then we have this missing data. So, maybe we had a problem with the script.
The problem appears to be in the time-series part. I made some changes, but others error appears. How to do it appropriately?
This is our data structure:
basin date P T Q
327781 5020 2023-01-14_22:00 0.000 2.416 0.253
327782 5020 2023-01-14_23:00 0.000 2.089 0.253
327783 5020 2023-01-15_00:00 0.008 2.344 0.253
327784 5020 2023-01-15_01:00 0.000 2.438 0.253
327785 5020 2023-01-15_02:00 0.033 2.310 0.218
... ... ... ... ... ...
332143 6470 2023-01-15_16:00 1.293 3.380 0.383
332144 6470 2023-01-15_17:00 1.600 3.361 0.383
332145 6470 2023-01-15_18:00 1.698 3.311 0.428
332146 6470 2023-01-15_19:00 1.881 3.201 0.428
332147 6470 2023-01-15_20:00 1.093 3.284 0.473
This code is for loading & formating csv file :
df = pd.read_csv(path_to_csv,sep=";")
df.columns = ['basin','date','P','T','Q']
df['basin']= df['basin'].astype('str')
df['basin'] = df.basin.str.replace(' ','')
basin = df['basin'].unique()
date_interval = df['date']
mask = (df['date'] >= '2023-01-14_22:00')& (df['date'] <= '2023-01-15_20:00')
df = df.loc[mask]
df['date'] = pd.to_datetime(df['date'],errors='coerce', format='%Y-%m-%d_%H:%M')
convert into .dss (example for one station):
x= '10088003'
data = df.loc[df['basin'] == x]
avg_prec = df['P']
avg_temp = df['T']
avg_flow = df['Q']
prec = np.array(avg_prec)
t2m = np.array(avg_temp)
discharge = np.array(avg_flow)
dss_file = '10088003.dss'
station_name = "HOFKIRCHEN"
length = len(prec)
################################################################
# PRECIPITATION CONVERT
################################################################
pathname = "/" + station_name + "/" + x + "/PRECIP-INC//1HOUR/GAGE/"
tsc = TimeSeriesContainer()
tsc.pathname = pathname
tsc.times = a
tsc.numberValues = len(prec)
tsc.units = "MM"
tsc.type = "PER-CUM"
tsc.interval = 1
tsc.values = prec
fid = HecDss.Open(dss_file)
fid.deletePathname(tsc.pathname)
fid.put_ts(tsc)
ts = fid.read_ts(pathname)
fid.close
Error:
09:42:18.601 -----DSS---zopen Existing file opened, File: /users/hips001/app/inca4hec/bin/10088003.dss
09:42:18.601 Handle 4; Process: 262806; DSS Versions - Software: 7-IQ, File: 7-IQ
09:42:18.601 Single-user advisory access mode
Traceback (most recent call last):
File "inca4hec_add_new.py", line 173, in <module>
create_dss('10088003')
File "inca4hec_add_new.py", line 113, in create_dss
fid.put_ts(tsc)
File "/users/hips001/bin/Python-hips/lib/python3.7/site-packages/pydsstools-2.2-py3.7-linux-x86_64.egg/pydsstools/heclib/dss/HecDss.py", line 130, in put_ts
super().put(tsc)
File "pydsstools/src/open.pyx", line 136, in pydsstools._lib.x64.core_heclib.Open.put
File "pydsstools/src/open.pyx", line 142, in pydsstools._lib.x64.core_heclib.Open.put
File "pydsstools/src/time_series.pyx", line 526, in pydsstools._lib.x64.core_heclib.createNewTimeSeries
File "pydsstools/src/hectime.pyx", line 199, in pydsstools._lib.x64.core_heclib.HecTime.__init__
File "pydsstools/src/hectime.pyx", line 291, in pydsstools._lib.x64.core_heclib.HecTime.parse_datetime_string
AttributeError: 'NoneType' object has no attribute 'year'
09:42:18.624 -----DSS---zclose Handle 4; Process: 262806; File: /users/hips001/app/inca4hec/bin/10088003.dss
09:42:18.624 Number records: 8
09:42:18.624 File size: 36999 64-bit words
09:42:18.624 File size: 289 Kb; 0 Mb
09:42:18.624 Dead space: 11
09:42:18.624 Hash range: 8192
09:42:18.624 Number hash used: 24
09:42:18.624 Max paths for hash: 2
09:42:18.624 Corresponding hash: 5463
09:42:18.624 Number non unique hash: 0
09:42:18.624 Number bins used: 24
09:42:18.624 Number overflow bins: 0
09:42:18.624 Number physical reads: 28
09:42:18.624 Number physical writes: 0
09:42:18.624 Number denied locks: 0
tsc.times = a
What is 'a' in this case? You'll need to make sure that tsc.times gets assigned to a list of the dates that you want. As for the rewriting, I think it may be because you are deleting the path before you use put_ts. Try removing that line and see if you get the append working.
I've also used this excel add-in successfully for creating dss files from csv's: https://hec-dss-excel-data-exchange-add-in-for-e1.software.informer.com/download/
I see some issues with the code. You are confusing irregular and regular time series up.
A regular time series has a known timestep and is defined by a start date/time and number of values. A regular time series has the interval:
tsc.interval = 1
This means you have a regular time series (known frequency) with a 1-hour timestep. If you are writing our a regular time series to dss, you need to use something like:
tsc.startDateTime = "15JUL2019 19:00:00"
where tsc.startDateTime
if formatted using HEC time strings. You do not need to pass a list of dates if you have a regular time series. in your example, you could define the start date time like:
## Define start date in hectime string
startDate = df.loc[df['basin'] == x, 'date'].min().strftime('%d%b%Y %H:%M')
You do not need to define tsc.times
for an irregular time series.
Irregular time series do not have a known time step and are defined by a lists of unique date and value pairs. For irregular time series, you do not to define a start date/time. Instead you need to provide a list of dates as @apreucil mentioned. The interval becomes:
tsc.interval = -1
and you can define a list of dates using
tsc.times = df.loc[df['basin'] == x, 'date'].to_numpy()
You do not need to define tsc.startDateTime
for an irregular time series.
Use this line only if you really need to, this will delete the exiting data in DSS and write out just the data you are currently working on
## This is a dangerous line, use only if necessary!
# fid.deletePathname(tsc.pathname)
This line might explain the missing data, but it also could be you need to resample the times series if you need a regular time series.
Ok, thanks a lot. This @danhamill example helped me and now it works. I have one more question. Is there any function that can be used for listing all variables from the column (in my case, Part B)? I'm trying to create a dynamic pathname
edit: problem solved, solution found:
fid = HecDss.Open(dss_file)
A = fid.getPathnameDict()
for path in A['TS']:
print("\n", path)
will print all paths:
//HENCOVCE/PERC-GW-2/01FEB2023/1HOUR/RUN:F_ONDAVA_L/
//SVIDNIK_LADOMIRKA/PERC-SOIL/01FEB2023/1HOUR/RUN:F_ONDAVA_L/
//HENCOVCE/FLOW-CUMULATIVE/01JAN2023/1HOUR/RUN:F_ONDAVA_L/
//SVIDNIK_ONDAVA/STORAGE-GW-2/01FEB2023/1HOUR/RUN:F_ONDAVA_L/
//HENCOVCE/FLOW-CUMULATIVE/01FEB2023/1HOUR/RUN:F_ONDAVA_L/
//HENCOVCE/FLOW-RESIDUAL/01JAN2023/1HOUR/RUN:F_ONDAVA_L/
//HENCOVCE/FLOW-OBSERVED-CUMULATIVE/01FEB2023/1HOUR/RUN:F_ONDAVA_L/
//HENCOVCE/FLOW-DIRECT/01JAN2023/1HOUR/RUN:F_ONDAVA_L/
//HENCOVCE/FLOW-BASE/01JAN2023/1HOUR/RUN:F_ONDAVA_L/
//HENCOVCE/FLOW-BASE/01FEB2023/1HOUR/RUN:F_ONDAVA_L/
//HENCOVCE/AQUIFER RECHARGE/01FEB2023/1HOUR/RUN:F_ONDAVA_L/
....
....
....
....