NEMOSIS dynamic_data_compiler
fnbillimor opened this issue · 8 comments
Hi
I as using NEMOSIS in conjunction with NEMSEER. When I pull data from the dynamic data compiler under a parquet format, it seems like all the data is saved as a datetime file.
nemosis_data = nemosis.dynamic_data_compiler(
nemosis_start,
time,
"TRADINGPRICE",
nemosis_cache,
filter_cols=["REGIONID"],
filter_values=(["SA1"],),
fformat="parquet",
)
actual_price = nemosis_data.groupby("SETTLEMENTDATE")["RRP"].sum()[time]
As a result it is unable to undertake the sum( ) operation as that is only restricted to float/int objects. This seems to be a recent issue, as I have run the same function a few days ago and did not have any issues. A printout of the "nemosis_data" variable is set out below.
INFO: Query raw data already downloaded to nemseer_cache
INFO: Converting PRICE data to xarray.
Compiling data for table TRADINGPRICE.
Returning TRADINGPRICE.
SETTLEMENTDATE REGIONID RRP \
2729 2021-01-01 00:30:00 SA1 1970-01-01 00:00:00.000000035
RAISE6SECRRP RAISE60SECRRP RAISE5MINRRP \
2729 1970-01-01 00:00:00.000000001 1970-01-01 00:00:00.000000001 1970-01-01
RAISEREGRRP LOWER6SECRRP \
2729 1970-01-01 00:00:00.000000010 1970-01-01 00:00:00.000000001
LOWER60SECRRP LOWER5MINRRP LOWERREGRRP \
2729 1970-01-01 00:00:00.000000003 1970-01-01 1970-01-01 00:00:00.000000012
PRICE_STATUS
2729 FIRM
Also attached below is the error message
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [56], in <cell line: 1>()
----> 1 results = list(result)
Input In [53], in calculate_pd_price_forecast_error(forecasted_time)
49 # select relevant price index from:
50 # RRP, RAISE6SECRRP, RAISE60SECRRP, RAISE5MINRRP , RAISEREGRRP
51 # LOWER6SECRRP, LOWER60SECRRP, LOWER5MINRRP , LOWERREGRRP
52 print(nemosis_data)
---> 53 actual_price = nemosis_data.groupby("SETTLEMENTDATE")["RRP"].sum()[time]
55 # sum forecast price for relevant region: QLD1 SA1 NSW1 VIC1 TAS1
56 price_forecasts=price_forecasts.sel(REGIONID="QLD1")
File ~\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py:2189, in GroupBy.sum(self, numeric_only, min_count, engine, engine_kwargs)
2185 # If we are grouping on categoricals we want unobserved categories to
2186 # return zero, rather than the default of NaN which the reindexing in
2187 # _agg_general() returns. GH #31422
2188 with com.temp_setattr(self, "observed", True):
-> 2189 result = self._agg_general(
2190 numeric_only=numeric_only,
2191 min_count=min_count,
2192 alias="add",
2193 npfunc=np.sum,
2194 )
2196 return self._reindex_output(result, fill_value=0)
File ~\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py:1506, in GroupBy._agg_general(self, numeric_only, min_count, alias, npfunc)
1494 @final
1495 def _agg_general(
1496 self,
(...)
1501 npfunc: Callable,
1502 ):
1504 with self._group_selection_context():
1505 # try a cython aggregation if we can
-> 1506 result = self._cython_agg_general(
1507 how=alias,
1508 alt=npfunc,
1509 numeric_only=numeric_only,
1510 min_count=min_count,
1511 )
1512 return result.__finalize__(self.obj, method="groupby")
File ~\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py:1592, in GroupBy._cython_agg_general(self, how, alt, numeric_only, min_count)
1588 return result
1590 # TypeError -> we may have an exception in trying to aggregate
1591 # continue and exclude the block
-> 1592 new_mgr = data.grouped_reduce(array_func, ignore_failures=True)
1594 if not is_ser and len(new_mgr) < len(data):
1595 warn_dropping_nuisance_columns_deprecated(type(self), how)
File ~\anaconda3\lib\site-packages\pandas\core\internals\base.py:199, in SingleDataManager.grouped_reduce(self, func, ignore_failures)
193 """
194 ignore_failures : bool, default False
195 Not used; for compatibility with ArrayManager/BlockManager.
196 """
198 arr = self.array
--> 199 res = func(arr)
200 index = default_index(len(res))
202 mgr = type(self).from_array(res, index)
File ~\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py:1578, in GroupBy._cython_agg_general.<locals>.array_func(values)
1576 def array_func(values: ArrayLike) -> ArrayLike:
1577 try:
-> 1578 result = self.grouper._cython_operation(
1579 "aggregate", values, how, axis=data.ndim - 1, min_count=min_count
1580 )
1581 except NotImplementedError:
1582 # generally if we have numeric_only=False
1583 # and non-applicable functions
1584 # try to python agg
1585 # TODO: shouldn't min_count matter?
1586 result = self._agg_py_fallback(values, ndim=data.ndim, alt=alt)
File ~\anaconda3\lib\site-packages\pandas\core\groupby\ops.py:939, in BaseGrouper._cython_operation(self, kind, values, how, axis, min_count, **kwargs)
937 ids, _, _ = self.group_info
938 ngroups = self.ngroups
--> 939 return cy_op.cython_operation(
940 values=values,
941 axis=axis,
942 min_count=min_count,
943 comp_ids=ids,
944 ngroups=ngroups,
945 **kwargs,
946 )
File ~\anaconda3\lib\site-packages\pandas\core\groupby\ops.py:614, in WrappedCythonOp.cython_operation(self, values, axis, min_count, comp_ids, ngroups, **kwargs)
610 is_numeric = is_numeric_dtype(dtype)
612 # can we do this operation with our cython functions
613 # if not raise NotImplementedError
--> 614 self._disallow_invalid_ops(dtype, is_numeric)
616 if not isinstance(values, np.ndarray):
617 # i.e. ExtensionArray
618 return self._ea_wrap_cython_operation(
619 values,
620 min_count=min_count,
(...)
623 **kwargs,
624 )
File ~\anaconda3\lib\site-packages\pandas\core\groupby\ops.py:239, in WrappedCythonOp._disallow_invalid_ops(self, dtype, is_numeric)
235 elif is_datetime64_any_dtype(dtype):
236 # we raise NotImplemented if this is an invalid operation
237 # entirely, e.g. adding datetimes
238 if how in ["add", "prod", "cumsum", "cumprod"]:
--> 239 raise TypeError(f"datetime64 type does not support {how} operations")
240 elif is_timedelta64_dtype(dtype):
241 if how in ["prod", "cumprod"]:
TypeError: datetime64 type does not support add operations
Hi @fnbillimor,
Thanks for reporting this issue, we really appreciate user feedback.
However, I'm having trouble replicating your error. Could you provide the complete details of the example, i.e. including the start and end time you are using?
Hi Nick - Thank you for getting back to me. I have uploaded the full code I have used to the github address below. I am using a start and end time based on the calender year 2021.
@fnbillimor can you please try and run the code below on your machine and let me know if it works or not? It should just be a simplified version of what you are doing, it appears to be working fine for me.
import nemosis
analysis_start = "2021/01/01 00:30:00"
analysis_end = "2022/01/01 00:00:00"
nemosis_cache = 'nemosis_cache/'
nemosis.cache_compiler(analysis_start,
analysis_end,
"TRADINGPRICE",
'nemosis_cache',
fformat="parquet"
)
nemosis_data = nemosis.dynamic_data_compiler(
analysis_start,
analysis_end,
"TRADINGPRICE",
nemosis_cache,
filter_cols=["REGIONID"],
filter_values=(["SA1"],),
fformat="parquet",
)
time = str(analysis_end).replace("-", "/")
print(nemosis_data['RRP'].sum())
# 1791056.4400000002
print(nemosis_data.groupby("SETTLEMENTDATE")["RRP"].sum()[time])
# 139.28
print(nemosis_data.dtypes)
# SETTLEMENTDATE datetime64[ns]
# REGIONID object
# RRP float64
# RAISE6SECRRP float64
# RAISE60SECRRP float64
# RAISE5MINRRP float64
# RAISEREGRRP float64
# LOWER6SECRRP float64
# LOWER60SECRRP float64
# LOWER5MINRRP float64
# LOWERREGRRP float64
# PRICE_STATUS object
# dtype: object
Hi @fnbillimor ,
I don't seem to have the issue either. This worked for me:
time = "2021/01/01 12:00:00"
# get actual demand data for forecasted_time
# nemosis start time must precede end of interval of interest by 5 minutes
nemosis_start = (
datetime.strptime(time, "%Y/%m/%d %H:%M:%S") - timedelta(minutes=30)
).strftime("%Y/%m/%d %H:%M:%S")
# compile data using nemosis, using cached parquet and filtering out interventions
# select appropriate region
nemosis_data = nemosis.dynamic_data_compiler(
nemosis_start,
time,
"TRADINGPRICE",
nemosis_cache,
filter_cols=["REGIONID", "PRICE_STATUS"],
filter_values=(["SA1"], ["FIRM"]),
fformat="parquet",
)
nemosis_data["RRP"].values[0]
Can I suggest you try a clean environment?
@fnbillimor still a WIP, but code under "Looking at price convergence more systematically" might be helpful, though noting that I have combined PD and P5MIN forecasts and dropped the overlapping PD forecasts: https://github.com/UNSW-CEEM/NEMSEER/blob/fff6a482ae3f6cd63e8ed03accdfe2fc2e268986/docs/source/examples/price_convergence_2021.ipynb
Thank you @prakaa and @nick-gorman - this is super helpful. I will try this code.
@fnbillimor the example is now complete and waiting in a PR. See the link below. You should be able to use this code directly, but you'll need to swap out the 5 minute pricing for trading price: https://nemseer--45.org.readthedocs.build/en/45/examples/price_convergence_2021.html
If you run into any issues, happy for you to post an issue in NEMSEER