Problem with merging different HEC-DSS file

Question

Problem with merging different HEC-DSS file

Closed this issue a year ago · 5 comments

Hello,
I am trying to append multiple time series of HEC-DSS data into one file, but there were always empty raws between the data set. I would be pleased if anyone could give me some tips on how to solve this issue using some pydsstools script. The attached picture is taken from the merged HEC-DSS data using DSSVue GUI interface.

Or is there any method to append HEC-DSS files into a single one?

Answer 1 · 2023-01-18T03:39:25.000Z

HEC-DSS appends by default, similar to a database. If you are writing to a time step that doesn't exist, the time series gets extended to cover the entire record. The only way to delete data is to manually delete a record and squeeze the dss file.

There are regular (i.e., known frequency) between timesteps and irregular timeseries that have date/value relationships.

I would inspect your data and try to write out an irregular time series first. Then you might be able to make sense of the missing data.

Answer 2 · 2023-01-19T10:42:37.000Z

Thank you for responding so quickly.

There are multiple problems we are facing. In the first place, we have this rewriting, not appending as we want, then we have this missing data. So, maybe we had a problem with the script.

The problem appears to be in the time-series part. I made some changes, but others error appears. How to do it appropriately?

This is our data structure:

       basin              date      P      T      Q
327781  5020  2023-01-14_22:00  0.000  2.416  0.253
327782  5020  2023-01-14_23:00  0.000  2.089  0.253
327783  5020  2023-01-15_00:00  0.008  2.344  0.253
327784  5020  2023-01-15_01:00  0.000  2.438  0.253
327785  5020  2023-01-15_02:00  0.033  2.310  0.218
...      ...               ...    ...    ...    ...
332143  6470  2023-01-15_16:00  1.293  3.380  0.383
332144  6470  2023-01-15_17:00  1.600  3.361  0.383
332145  6470  2023-01-15_18:00  1.698  3.311  0.428
332146  6470  2023-01-15_19:00  1.881  3.201  0.428
332147  6470  2023-01-15_20:00  1.093  3.284  0.473

This code is for loading & formating csv file :

df = pd.read_csv(path_to_csv,sep=";")
df.columns = ['basin','date','P','T','Q']
df['basin']= df['basin'].astype('str')
df['basin'] = df.basin.str.replace(' ','')

basin = df['basin'].unique()
date_interval = df['date']

mask = (df['date'] >= '2023-01-14_22:00')& (df['date'] <= '2023-01-15_20:00')
df = df.loc[mask]

df['date'] = pd.to_datetime(df['date'],errors='coerce', format='%Y-%m-%d_%H:%M')

convert into .dss (example for one station):

x= '10088003'
data = df.loc[df['basin'] == x]

avg_prec = df['P']
avg_temp = df['T']
avg_flow = df['Q']
prec = np.array(avg_prec)
t2m = np.array(avg_temp)
discharge = np.array(avg_flow)

dss_file = '10088003.dss'
station_name = "HOFKIRCHEN"
length = len(prec)

################################################################
#                      PRECIPITATION CONVERT
################################################################

pathname = "/" + station_name + "/" + x + "/PRECIP-INC//1HOUR/GAGE/"

tsc = TimeSeriesContainer()
tsc.pathname = pathname
tsc.times = a
tsc.numberValues = len(prec)
tsc.units = "MM"
tsc.type = "PER-CUM"
tsc.interval = 1
tsc.values = prec

fid = HecDss.Open(dss_file)
fid.deletePathname(tsc.pathname)
fid.put_ts(tsc)
ts = fid.read_ts(pathname)
fid.close

Error:

09:42:18.601      -----DSS---zopen   Existing file opened,  File: /users/hips001/app/inca4hec/bin/10088003.dss
09:42:18.601                         Handle 4;  Process: 262806;  DSS Versions - Software: 7-IQ, File:  7-IQ
09:42:18.601                         Single-user advisory access mode
Traceback (most recent call last):
  File "inca4hec_add_new.py", line 173, in <module>
    create_dss('10088003')
  File "inca4hec_add_new.py", line 113, in create_dss
    fid.put_ts(tsc)
  File "/users/hips001/bin/Python-hips/lib/python3.7/site-packages/pydsstools-2.2-py3.7-linux-x86_64.egg/pydsstools/heclib/dss/HecDss.py", line 130, in put_ts
    super().put(tsc)
  File "pydsstools/src/open.pyx", line 136, in pydsstools._lib.x64.core_heclib.Open.put
  File "pydsstools/src/open.pyx", line 142, in pydsstools._lib.x64.core_heclib.Open.put
  File "pydsstools/src/time_series.pyx", line 526, in pydsstools._lib.x64.core_heclib.createNewTimeSeries
  File "pydsstools/src/hectime.pyx", line 199, in pydsstools._lib.x64.core_heclib.HecTime.__init__
  File "pydsstools/src/hectime.pyx", line 291, in pydsstools._lib.x64.core_heclib.HecTime.parse_datetime_string
AttributeError: 'NoneType' object has no attribute 'year'
09:42:18.624      -----DSS---zclose  Handle 4;  Process: 262806;  File: /users/hips001/app/inca4hec/bin/10088003.dss
09:42:18.624                         Number records:         8
09:42:18.624                         File size:              36999  64-bit words
09:42:18.624                         File size:              289 Kb;  0 Mb
09:42:18.624                         Dead space:             11
09:42:18.624                         Hash range:             8192
09:42:18.624                         Number hash used:       24
09:42:18.624                         Max paths for hash:     2
09:42:18.624                         Corresponding hash:     5463
09:42:18.624                         Number non unique hash: 0
09:42:18.624                         Number bins used:       24
09:42:18.624                         Number overflow bins:   0
09:42:18.624                         Number physical reads:  28
09:42:18.624                         Number physical writes: 0
09:42:18.624                         Number denied locks:    0

Answer 3 · 2023-01-19T14:28:54.000Z

tsc.times = a

What is 'a' in this case? You'll need to make sure that tsc.times gets assigned to a list of the dates that you want. As for the rewriting, I think it may be because you are deleting the path before you use put_ts. Try removing that line and see if you get the append working.

I've also used this excel add-in successfully for creating dss files from csv's: https://hec-dss-excel-data-exchange-add-in-for-e1.software.informer.com/download/

Answer 4 · 2023-01-19T19:06:00.000Z

I see some issues with the code. You are confusing irregular and regular time series up.

A regular time series has a known timestep and is defined by a start date/time and number of values. A regular time series has the interval:

tsc.interval = 1

This means you have a regular time series (known frequency) with a 1-hour timestep. If you are writing our a regular time series to dss, you need to use something like:

tsc.startDateTime = "15JUL2019 19:00:00"

where tsc.startDateTime if formatted using HEC time strings. You do not need to pass a list of dates if you have a regular time series. in your example, you could define the start date time like:

## Define start date in hectime string
startDate =  df.loc[df['basin'] == x, 'date'].min().strftime('%d%b%Y %H:%M')

You do not need to define tsc.times for an irregular time series.

Irregular time series do not have a known time step and are defined by a lists of unique date and value pairs. For irregular time series, you do not to define a start date/time. Instead you need to provide a list of dates as @apreucil mentioned. The interval becomes:

tsc.interval = -1

and you can define a list of dates using

tsc.times = df.loc[df['basin'] == x, 'date'].to_numpy()

You do not need to define tsc.startDateTime for an irregular time series.

Use this line only if you really need to, this will delete the exiting data in DSS and write out just the data you are currently working on

## This is a dangerous line, use only if necessary!
# fid.deletePathname(tsc.pathname)

This line might explain the missing data, but it also could be you need to resample the times series if you need a regular time series.

Answer 5 · 2023-02-06T14:45:40.000Z

Ok, thanks a lot. This @danhamill example helped me and now it works. I have one more question. Is there any function that can be used for listing all variables from the column (in my case, Part B)? I'm trying to create a dynamic pathname

edit: problem solved, solution found:

fid = HecDss.Open(dss_file)
A  = fid.getPathnameDict()

for path in A['TS']:
      print("\n", path)

will print all paths:

//HENCOVCE/PERC-GW-2/01FEB2023/1HOUR/RUN:F_ONDAVA_L/

//SVIDNIK_LADOMIRKA/PERC-SOIL/01FEB2023/1HOUR/RUN:F_ONDAVA_L/

//HENCOVCE/FLOW-CUMULATIVE/01JAN2023/1HOUR/RUN:F_ONDAVA_L/

//SVIDNIK_ONDAVA/STORAGE-GW-2/01FEB2023/1HOUR/RUN:F_ONDAVA_L/

//HENCOVCE/FLOW-CUMULATIVE/01FEB2023/1HOUR/RUN:F_ONDAVA_L/

//HENCOVCE/FLOW-RESIDUAL/01JAN2023/1HOUR/RUN:F_ONDAVA_L/

//HENCOVCE/FLOW-OBSERVED-CUMULATIVE/01FEB2023/1HOUR/RUN:F_ONDAVA_L/

//HENCOVCE/FLOW-DIRECT/01JAN2023/1HOUR/RUN:F_ONDAVA_L/

//HENCOVCE/FLOW-BASE/01JAN2023/1HOUR/RUN:F_ONDAVA_L/

//HENCOVCE/FLOW-BASE/01FEB2023/1HOUR/RUN:F_ONDAVA_L/

//HENCOVCE/AQUIFER RECHARGE/01FEB2023/1HOUR/RUN:F_ONDAVA_L/
....
....
....
....