Breakthrough-Energy/PowerSimData

Duplicate entries in buses_NEEMregion.csv can cause ValueError when calculating AC investment costs

danielolsen opened this issue · 0 comments

🪲

  • I have checked that this issue has not already been reported.

Bug summary

For some scenarios, if the upgrades branches touch a bus for which there are duplicate entries in the powersimdata/design/investment/databuses_NEEMregion.csv file, a ValueError will be thrown. This is because the duplicated index causes the lookup to grab more values than it should, and therefore we're trying to insert more data than the data frame has room for.

Code for reproduction

A minimum code snippet required to reproduce the bug. Please make sure to minimize the
number of dependencies required.

from powersimdata import Scenario
from powersimdata.design.investment.investment_costs import calculate_ac_inv_costs
scenario = Scenario(3287)
calculate_ac_inv_costs(scenario)

Actual outcome

The output produced by the above code, which may be a screenshot, console output, etc.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\DanielOlsen\repos\bes\PowerSimData\powersimdata\design\investment\investment_costs.py", line 86, in calculate_ac_inv_costs
    costs = _calculate_ac_inv_costs(grid_differences, sum_results)
  File "C:\Users\DanielOlsen\repos\bes\PowerSimData\powersimdata\design\investment\investment_costs.py", line 211, in _calculate_ac_inv_costs
    branch.loc[:, "to_region"] = bus_reg.loc[branch.to_bus_id, "name_abbr"].tolist()
  File "C:\Python39\lib\site-packages\pandas\core\indexing.py", line 670, in __setitem__
    iloc._setitem_with_indexer(indexer, value)
  File "C:\Python39\lib\site-packages\pandas\core\indexing.py", line 1601, in _setitem_with_indexer
    self._setitem_with_indexer(new_indexer, value)
  File "C:\Python39\lib\site-packages\pandas\core\indexing.py", line 1666, in _setitem_with_indexer
    raise ValueError(
ValueError: cannot set using a multi-index selection indexer with a different length than the value

Expected outcome

No error.

Environment

Please specify your platform and versions of the relevant libraries you are using:

  • Operating system: Windows
  • PowerSimData revision: ab21118
  • Python version: 3.9.6

Additional context

We can see the root cause directly by inspecting the CSV as it is loaded within the investment cost module:

>>> import pandas as pd
>>> from powersimdata.design.investment import const
>>> bus_reg = pd.read_csv(const.bus_neem_regions_path, index_col="bus_id")
>>> bus_reg.loc[bus_reg.index.duplicated(False)]
             name_abbr  dist      lat      lon
bus_id
12359          PJM ROM   0.0  39.9515 -75.8260
12359            PJM E   0.0  39.9515 -75.8260
12459            PJM E   0.0  39.9024 -75.8392
12459          PJM ROM   0.0  39.9024 -75.8392
12460          PJM ROM   0.0  39.9024 -75.8392
12460            PJM E   0.0  39.9024 -75.8392
36055   NonRTO Midwest   0.0  39.6566 -83.5378
36055          PJM ROR   0.0  39.6566 -83.5378
42494   NonRTO Midwest   0.0  37.9335 -87.5617
42494          MISO IN   0.0  37.9335 -87.5617
43085        MISO WUMS   0.0  42.4947 -88.6446
43085          PJM ROR   0.0  42.4947 -88.6446
47988        MISO WUMS   0.0  42.5008 -89.1299
47988          PJM ROR   0.0  42.5008 -89.1299
47989        MISO WUMS   0.0  42.5008 -89.1299
47989          PJM ROR   0.0  42.5008 -89.1299
48134        MISO WUMS   0.0  42.4960 -88.5134
48134          PJM ROR   0.0  42.4960 -88.5134

The code had not minded before #450, since it would grab however many entries for both buses and average them naively.

Suggested solution: fix powersimdata.design.investment.create_mapping_files.write_bus_neem_map to never cause this issue (either fix the bug that causes one bug to map to multiple regions, or select one somehow and toss the other), and re-generate the buses_NEEMregion.csv file.