mggg/GerryChain

Updaters should return pandas.Series instead of dictionaries

maxhully opened this issue · 2 comments

Right now we keep track of the part IDs in the assignment, so that if you started with nodes assigned to "DISTRICT_1", "DISTRICT_2", etc., thats what you get for every step in the chain. One consequence of that is that updaters generally return dictionaries with those district labels as keys.

We should instead assign districts integer indices 0, ..., n and have our updaters return tuples.

99% of the time we're only interested in the values of the dictionary anyway, so it will remove a step in the output collection process. It also is more appropriate to use an immutable data structure here. And on the theoretical side, the labelings of districts aren't really canonical, especially when using ReCom which completely redraws two districts at each step. So deemphasizing the labels seems like a good move. I think it might also make it cleaner for saving outputs in DataFrames or xarray data structures.

This change will probably break everyone's code.

OK, my new perspective on this is that we should try to make things as interoperable with pandas or numpy as possible. Our users will end up learning pandas anyways (most likely), and on top of leveraging pandas's functionality, I think it's a good idea for us to minimize the number of new weird gerrychain idioms that a person has to learn.

I don't think we're ready to move away from dictionaries, so I'm gonna close this.