UNSW-CEEM/NEMOSIS

NEMOSIS dynamic_data_compiler

fnbillimor opened this issue · 8 comments

Hi

I as using NEMOSIS in conjunction with NEMSEER. When I pull data from the dynamic data compiler under a parquet format, it seems like all the data is saved as a datetime file.

    nemosis_data = nemosis.dynamic_data_compiler(
        nemosis_start,
        time,
        "TRADINGPRICE",
        nemosis_cache,
        filter_cols=["REGIONID"],
        filter_values=(["SA1"],),
        fformat="parquet",
    )
    
    actual_price = nemosis_data.groupby("SETTLEMENTDATE")["RRP"].sum()[time]

As a result it is unable to undertake the sum( ) operation as that is only restricted to float/int objects. This seems to be a recent issue, as I have run the same function a few days ago and did not have any issues. A printout of the "nemosis_data" variable is set out below.


INFO: Query raw data already downloaded to nemseer_cache
INFO: Converting PRICE data to xarray.
Compiling data for table TRADINGPRICE.
Returning TRADINGPRICE.
          SETTLEMENTDATE REGIONID                           RRP  \
2729 2021-01-01 00:30:00      SA1 1970-01-01 00:00:00.000000035   

                      RAISE6SECRRP                 RAISE60SECRRP RAISE5MINRRP  \
2729 1970-01-01 00:00:00.000000001 1970-01-01 00:00:00.000000001   1970-01-01   

                       RAISEREGRRP                  LOWER6SECRRP  \
2729 1970-01-01 00:00:00.000000010 1970-01-01 00:00:00.000000001   

                     LOWER60SECRRP LOWER5MINRRP                   LOWERREGRRP  \
2729 1970-01-01 00:00:00.000000003   1970-01-01 1970-01-01 00:00:00.000000012   

     PRICE_STATUS  
2729         FIRM  

Also attached below is the error message


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [56], in <cell line: 1>()
----> 1 results = list(result)

Input In [53], in calculate_pd_price_forecast_error(forecasted_time)
     49 # select relevant price index from:
     50 # RRP, RAISE6SECRRP, RAISE60SECRRP, RAISE5MINRRP , RAISEREGRRP
     51 #      LOWER6SECRRP, LOWER60SECRRP, LOWER5MINRRP , LOWERREGRRP 
     52 print(nemosis_data)
---> 53 actual_price = nemosis_data.groupby("SETTLEMENTDATE")["RRP"].sum()[time]
     55 # sum forecast price for relevant region: QLD1 SA1 NSW1 VIC1 TAS1
     56 price_forecasts=price_forecasts.sel(REGIONID="QLD1")

File ~\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py:2189, in GroupBy.sum(self, numeric_only, min_count, engine, engine_kwargs)
   2185 # If we are grouping on categoricals we want unobserved categories to
   2186 # return zero, rather than the default of NaN which the reindexing in
   2187 # _agg_general() returns. GH #31422
   2188 with com.temp_setattr(self, "observed", True):
-> 2189     result = self._agg_general(
   2190         numeric_only=numeric_only,
   2191         min_count=min_count,
   2192         alias="add",
   2193         npfunc=np.sum,
   2194     )
   2196 return self._reindex_output(result, fill_value=0)

File ~\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py:1506, in GroupBy._agg_general(self, numeric_only, min_count, alias, npfunc)
   1494 @final
   1495 def _agg_general(
   1496     self,
   (...)
   1501     npfunc: Callable,
   1502 ):
   1504     with self._group_selection_context():
   1505         # try a cython aggregation if we can
-> 1506         result = self._cython_agg_general(
   1507             how=alias,
   1508             alt=npfunc,
   1509             numeric_only=numeric_only,
   1510             min_count=min_count,
   1511         )
   1512         return result.__finalize__(self.obj, method="groupby")

File ~\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py:1592, in GroupBy._cython_agg_general(self, how, alt, numeric_only, min_count)
   1588     return result
   1590 # TypeError -> we may have an exception in trying to aggregate
   1591 #  continue and exclude the block
-> 1592 new_mgr = data.grouped_reduce(array_func, ignore_failures=True)
   1594 if not is_ser and len(new_mgr) < len(data):
   1595     warn_dropping_nuisance_columns_deprecated(type(self), how)

File ~\anaconda3\lib\site-packages\pandas\core\internals\base.py:199, in SingleDataManager.grouped_reduce(self, func, ignore_failures)
    193 """
    194 ignore_failures : bool, default False
    195     Not used; for compatibility with ArrayManager/BlockManager.
    196 """
    198 arr = self.array
--> 199 res = func(arr)
    200 index = default_index(len(res))
    202 mgr = type(self).from_array(res, index)

File ~\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py:1578, in GroupBy._cython_agg_general.<locals>.array_func(values)
   1576 def array_func(values: ArrayLike) -> ArrayLike:
   1577     try:
-> 1578         result = self.grouper._cython_operation(
   1579             "aggregate", values, how, axis=data.ndim - 1, min_count=min_count
   1580         )
   1581     except NotImplementedError:
   1582         # generally if we have numeric_only=False
   1583         # and non-applicable functions
   1584         # try to python agg
   1585         # TODO: shouldn't min_count matter?
   1586         result = self._agg_py_fallback(values, ndim=data.ndim, alt=alt)

File ~\anaconda3\lib\site-packages\pandas\core\groupby\ops.py:939, in BaseGrouper._cython_operation(self, kind, values, how, axis, min_count, **kwargs)
    937 ids, _, _ = self.group_info
    938 ngroups = self.ngroups
--> 939 return cy_op.cython_operation(
    940     values=values,
    941     axis=axis,
    942     min_count=min_count,
    943     comp_ids=ids,
    944     ngroups=ngroups,
    945     **kwargs,
    946 )

File ~\anaconda3\lib\site-packages\pandas\core\groupby\ops.py:614, in WrappedCythonOp.cython_operation(self, values, axis, min_count, comp_ids, ngroups, **kwargs)
    610 is_numeric = is_numeric_dtype(dtype)
    612 # can we do this operation with our cython functions
    613 # if not raise NotImplementedError
--> 614 self._disallow_invalid_ops(dtype, is_numeric)
    616 if not isinstance(values, np.ndarray):
    617     # i.e. ExtensionArray
    618     return self._ea_wrap_cython_operation(
    619         values,
    620         min_count=min_count,
   (...)
    623         **kwargs,
    624     )

File ~\anaconda3\lib\site-packages\pandas\core\groupby\ops.py:239, in WrappedCythonOp._disallow_invalid_ops(self, dtype, is_numeric)
    235 elif is_datetime64_any_dtype(dtype):
    236     # we raise NotImplemented if this is an invalid operation
    237     #  entirely, e.g. adding datetimes
    238     if how in ["add", "prod", "cumsum", "cumprod"]:
--> 239         raise TypeError(f"datetime64 type does not support {how} operations")
    240 elif is_timedelta64_dtype(dtype):
    241     if how in ["prod", "cumprod"]:

TypeError: datetime64 type does not support add operations

Hi @fnbillimor,

Thanks for reporting this issue, we really appreciate user feedback.

However, I'm having trouble replicating your error. Could you provide the complete details of the example, i.e. including the start and end time you are using?

Hi Nick - Thank you for getting back to me. I have uploaded the full code I have used to the github address below. I am using a start and end time based on the calender year 2021.

https://github.com/fnbillimor/PD_prices

@fnbillimor can you please try and run the code below on your machine and let me know if it works or not? It should just be a simplified version of what you are doing, it appears to be working fine for me.

import nemosis

analysis_start = "2021/01/01 00:30:00"
analysis_end = "2022/01/01 00:00:00"
nemosis_cache = 'nemosis_cache/'

nemosis.cache_compiler(analysis_start,
                       analysis_end,
                       "TRADINGPRICE",
                       'nemosis_cache',
                       fformat="parquet"
                       )

nemosis_data = nemosis.dynamic_data_compiler(
    analysis_start,
    analysis_end,
    "TRADINGPRICE",
    nemosis_cache,
    filter_cols=["REGIONID"],
    filter_values=(["SA1"],),
    fformat="parquet",
)

time = str(analysis_end).replace("-", "/")

print(nemosis_data['RRP'].sum())
# 1791056.4400000002

print(nemosis_data.groupby("SETTLEMENTDATE")["RRP"].sum()[time])
# 139.28

print(nemosis_data.dtypes)
# SETTLEMENTDATE    datetime64[ns]
# REGIONID                  object
# RRP                      float64
# RAISE6SECRRP             float64
# RAISE60SECRRP            float64
# RAISE5MINRRP             float64
# RAISEREGRRP              float64
# LOWER6SECRRP             float64
# LOWER60SECRRP            float64
# LOWER5MINRRP             float64
# LOWERREGRRP              float64
# PRICE_STATUS              object
# dtype: object

Hi @fnbillimor ,

I don't seem to have the issue either. This worked for me:

time = "2021/01/01 12:00:00"
# get actual demand data for forecasted_time
# nemosis start time must precede end of interval of interest by 5 minutes
nemosis_start = (
    datetime.strptime(time, "%Y/%m/%d %H:%M:%S") - timedelta(minutes=30)
).strftime("%Y/%m/%d %H:%M:%S")
# compile data using nemosis, using cached parquet and filtering out interventions
# select appropriate region
nemosis_data = nemosis.dynamic_data_compiler(
    nemosis_start,
    time,
    "TRADINGPRICE",
    nemosis_cache,
    filter_cols=["REGIONID", "PRICE_STATUS"],
    filter_values=(["SA1"], ["FIRM"]),
    fformat="parquet",
)
nemosis_data["RRP"].values[0]

image

Can I suggest you try a clean environment?

@fnbillimor still a WIP, but code under "Looking at price convergence more systematically" might be helpful, though noting that I have combined PD and P5MIN forecasts and dropped the overlapping PD forecasts: https://github.com/UNSW-CEEM/NEMSEER/blob/fff6a482ae3f6cd63e8ed03accdfe2fc2e268986/docs/source/examples/price_convergence_2021.ipynb

Thank you @prakaa and @nick-gorman - this is super helpful. I will try this code.

@fnbillimor the example is now complete and waiting in a PR. See the link below. You should be able to use this code directly, but you'll need to swap out the 5 minute pricing for trading price: https://nemseer--45.org.readthedocs.build/en/45/examples/price_convergence_2021.html

If you run into any issues, happy for you to post an issue in NEMSEER