Further improvement of the supplychain module
Opened this issue · 3 comments
The module provides an excellent basis for indirect impact computation, but there are several aspects in the code structure that could largely benefit from a rework.
- Most methods do too many things at once, more granularity would notably ease unit tests
- The methods
calc_shock_to_sectors
andcalc_impact
can quickly confuse the user. Notably, as they both accept exposure and impact as arguments, thus the entry point for exposure and impact is unclear. - In particular, there is a concerning block in
calc_impact
where exposure and impact arguments are ignored if a shock was already computed. (note that #81 will add a warning) - Multiple attributes of a
SupplyChain
object are initiallyNone
and are set by methods thereafter. This can create confusion on when attributes are correctly computed, and may lead to incoherent objects.
I propose to improve the implementation with the following (and could work on that once #81 is merged, or later, this is mostly to track ideas). I see at least two ways to do this:
- Keep a single SupplyChain class, and improve its initialisation and the distinction between
calc_shock_to_sectors
andcalc_impact
. The core idea would be that instantiating a SupplyChain object would by default require both MRIOT, Exposure and Impact, and create an object which holds direct impact translated into MRIOT format, from which indirect impact can then be computed.
Pros:
- Not that much work
- Keeps things mostly similar
Cons: - Might still lack clarity
- SupplyChain would still hold empty attributes for indirect impact results when the computation was not run
- Separate things more by creating two new classes, say
IO_Shock
andIndirectImpact
(Other suggestions welcome!), such that,IO_Shock
does the translation of a CLIMADA impact into a MRIOT formatted one, and IndirectImpact does the computation.
Pros:
- Probably clearer
- Segments things better
- Would probably be easier to extend in the future
Cons: - Will take more time and work
- Will break probably all current usage of the module
05/06/24: Approach chosen is 2.
Other sub-issues that should be addressed:
- for the
boario
approach, the simulation length is currentlylast_event_date + 365
, the 365 should not be hardcoded - the
io_model
variable in the calc_matrices() method should probably be a class attribute?
Features that could be added / or should be thought of while doing the rework:
- Being able to generate
IO_Shocks
from other inputs like summary statistics instead ofImpact
objects. - Integration of the Inoperability model
- Integration of the MRIA model
- Feedback loop between indirect shocks and direct shocks for multiple event case (better account for the fact that what is destroyed cannot be destroyed again)
Features that could be added / or should be thought of while doing the rework:
Integration of the Inoperability model
Integration of the MRIA model
Feedback loop between indirect shocks and direct shocks for multiple event case (better account for the fact that what is destroyed cannot be destroyed again)
Thanks a lot @spjuhel - this is quite alinged with what I had in mind!
I like these ideas, and having a granular, pythonic refactor would be lovely.
- I don't know how many users depend on this yet: as a fairly recent creation it's hopefully not many. I'm definitely happy to work with breaking changes and refactor my code
- I'm in favour of renaming things and splitting functionality. (But not
DirectImpact
/IndirectImpact
: it's confusing when a class with Impact in the name should inherit from theImpact
class). MaybeDirectShock
andIndirectShock
. (See also my next comment/suggestion/request which would love this split).
I also have an additional functionality request! Maybe this should be a separate Issue. But I'll start here for the purpose of discussion since it seems at least a little relevant.
One particular bit of granularity that I'd like is flexibility with the calc_shock_to_sectors
method. Currently It takes an Impact
and an Exposures
object as input, aggregates them by region_id
, and translates that as a direct shock to the MRIOT tables based on the requested sectors. The flexibility I need is to work with larger numbers of Impacts and Exposures. Namely:
- shocking different sectors with different impacts. Currently the method calculates a single % shock for each country and applies it to all selected MRIOT sectors in that country, whereas I need to shock, e.g. agriculture and manufacturing sectors differently
- the ability to provide an
Impact
-Exposures
pair for eachregion_id
, rather than needing them all to be combined into single huge objects. For the global analyses I'm running these combined objects are wayyy too large - ideally, not requiring the
Impact
object to have animp_mat
. For large event sets and many sector-country combinations the matrices can be huge and I'd prefer not to store them. The calculation only needs the total impact by event and country, it doesn't care about individual exposures. My suggestions below (crudely) work around this.
My proposed solution here is to separate calc_shock_to_sectors
functionality into two bits of functionality: namely calculating shocks and combining shocks, letting you aggregate across multiple Impacts/Exposures/sectors. I don't know the best way to implement them because the order can be flexible: aggregating can be done before or after the main part of calc_shock_to_sectors
. That is,
- either: (1) aggregate_impacts_to_shock
, which takes an Impact
-Exposure
pair and returns a data frame with the % loss for each event in each region_id. Then (2) calc_shock_to_sectors
which takes a list of these data frames and calculates the input shocks for the MRIOT calculations from them, creating the final SupplyChain.secs_shock
object. Ideally, the methods also have a way to encode which sectors are shocked by each dataframe so we can shock different sectors differently
- or: (1) keep calc_shock_to_sectors
as it is, and then (2) provide a combine_shocks
method that takes multiple SupplyChain.secs_shock
objects and sums them, allowing you to build the final shock from one country-sector at a time. Note: if the suggested DirectShock
and IndirectShock
classes are created, then this can be implemented as a DirectShock.combine(data: List[DirectShock])
class method.
Both of these solutions allow me to work with large numbers of Impact
objects and Exposures
more elegantly. To deal with the impact matrix question, I would also add a check in calc_shock_to_sectors
that spots when an Impact
is only for a single country/region_id
and then uses the Impact.at_event
property instead of the marginal sums of the Impact.imp_mat
, since these are equivalent in this case.
For now I have a local workaround with the second method. I'd be happy to turn it into a pull request!