Duplicate entries in buses_NEEMregion.csv can cause ValueError when calculating AC investment costs
danielolsen opened this issue · 0 comments
🪲
- I have checked that this issue has not already been reported.
Bug summary
For some scenarios, if the upgrades branches touch a bus for which there are duplicate entries in the powersimdata/design/investment/databuses_NEEMregion.csv file, a ValueError will be thrown. This is because the duplicated index causes the lookup to grab more values than it should, and therefore we're trying to insert more data than the data frame has room for.
Code for reproduction
A minimum code snippet required to reproduce the bug. Please make sure to minimize the
number of dependencies required.
from powersimdata import Scenario
from powersimdata.design.investment.investment_costs import calculate_ac_inv_costs
scenario = Scenario(3287)
calculate_ac_inv_costs(scenario)
Actual outcome
The output produced by the above code, which may be a screenshot, console output, etc.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\DanielOlsen\repos\bes\PowerSimData\powersimdata\design\investment\investment_costs.py", line 86, in calculate_ac_inv_costs
costs = _calculate_ac_inv_costs(grid_differences, sum_results)
File "C:\Users\DanielOlsen\repos\bes\PowerSimData\powersimdata\design\investment\investment_costs.py", line 211, in _calculate_ac_inv_costs
branch.loc[:, "to_region"] = bus_reg.loc[branch.to_bus_id, "name_abbr"].tolist()
File "C:\Python39\lib\site-packages\pandas\core\indexing.py", line 670, in __setitem__
iloc._setitem_with_indexer(indexer, value)
File "C:\Python39\lib\site-packages\pandas\core\indexing.py", line 1601, in _setitem_with_indexer
self._setitem_with_indexer(new_indexer, value)
File "C:\Python39\lib\site-packages\pandas\core\indexing.py", line 1666, in _setitem_with_indexer
raise ValueError(
ValueError: cannot set using a multi-index selection indexer with a different length than the value
Expected outcome
No error.
Environment
Please specify your platform and versions of the relevant libraries you are using:
- Operating system: Windows
- PowerSimData revision: ab21118
- Python version: 3.9.6
Additional context
We can see the root cause directly by inspecting the CSV as it is loaded within the investment cost module:
>>> import pandas as pd
>>> from powersimdata.design.investment import const
>>> bus_reg = pd.read_csv(const.bus_neem_regions_path, index_col="bus_id")
>>> bus_reg.loc[bus_reg.index.duplicated(False)]
name_abbr dist lat lon
bus_id
12359 PJM ROM 0.0 39.9515 -75.8260
12359 PJM E 0.0 39.9515 -75.8260
12459 PJM E 0.0 39.9024 -75.8392
12459 PJM ROM 0.0 39.9024 -75.8392
12460 PJM ROM 0.0 39.9024 -75.8392
12460 PJM E 0.0 39.9024 -75.8392
36055 NonRTO Midwest 0.0 39.6566 -83.5378
36055 PJM ROR 0.0 39.6566 -83.5378
42494 NonRTO Midwest 0.0 37.9335 -87.5617
42494 MISO IN 0.0 37.9335 -87.5617
43085 MISO WUMS 0.0 42.4947 -88.6446
43085 PJM ROR 0.0 42.4947 -88.6446
47988 MISO WUMS 0.0 42.5008 -89.1299
47988 PJM ROR 0.0 42.5008 -89.1299
47989 MISO WUMS 0.0 42.5008 -89.1299
47989 PJM ROR 0.0 42.5008 -89.1299
48134 MISO WUMS 0.0 42.4960 -88.5134
48134 PJM ROR 0.0 42.4960 -88.5134
The code had not minded before #450, since it would grab however many entries for both buses and average them naively.
Suggested solution: fix powersimdata.design.investment.create_mapping_files.write_bus_neem_map
to never cause this issue (either fix the bug that causes one bug to map to multiple regions, or select one somehow and toss the other), and re-generate the buses_NEEMregion.csv file.