projectmesa/mesa

Proposal: Adding Cell Properties to Grids in Mesa

EwoutH opened this issue · 22 comments

This post is now slightly out of date, but preserved for historical context. See #1898 for the proposed implementation.

This proposal suggests two potential methods for introducing cell properties to grids in Mesa: a base class extension and a Mixin class. These enhancements will enable users to assign and manage custom properties (such as color or type) to grid cells, similar to patches in NetLogo. This functionality can be beneficial for a wide range of simulations where the environment's characteristics are as crucial as the agents' behaviors.

Motivation

Introducing properties to grid cells in agent-based models adds significant depth and realism to simulations. These properties can represent various environmental factors like terrain type, resources, or pollution levels, impacting how agents interact and behave. For example, in ecological simulations, cell properties might dictate animal movement based on land fertility or water availability. In urban models, properties could represent different neighborhood characteristics, influencing decisions made by agents representing residents or developers. This enhancement not only allows for more detailed and realistic modeling scenarios but also makes simulations more versatile and applicable to a wide range of fields, from ecology and urban planning to social sciences.

Why not just a layer of agents (in a MultiGrid)?

  • From a technical standpoint, cell properties can reduce computational overhead, as they avoid the need to create and manage numerous agent objects for static or slow-changing environmental features. They don't need to be added to a schedule, since they're passive entities.
  • Conceptually, cell properties provide a more direct and intuitive representation of the environment, enhancing model clarity and interpretability. They allow for a straightforward distinction between dynamic agents and the static or semi-static environment, reducing unnecessary complexity and improving the ease of maintenance and analysis of the model.

Implementation

Here are two high-level options how to integrate grid cell properties to the Mesa Space module.

Option 1: Base Class Extension

Implementation:

Extend the existing grid classes (SingleGrid, MultiGrid, HexSingleGrid, HexMultiGrid) to include a dictionary for cell properties.

Advantages:

  • Direct and straightforward implementation.
  • Clear inheritance structure.

Disadvantages:

  • Modifies the original Mesa grid classes.
  • Less flexible in terms of mixing with other potential extensions.

Usage Example:

from mesa import Model
from mesa.space import MultiGrid

class MultiGridWithProperties(MultiGrid):
    def __init__(self, width, height, torus):
        super().__init__(width, height, torus)
        self.cell_properties = {(x, y): {} for x in range(width) for y in range(height)}

    def set_cell_property(self, pos, property_name, value):
        self.cell_properties[pos][property_name] = value

    def get_cell_property(self, pos, property_name):
        return self.cell_properties[pos].get(property_name)

    def get_cells_with_property(self, property_name, value):
        return [pos for pos, properties in self.cell_properties.items() if properties.get(property_name) == value]

# Usage Example
class MyModel(Model):
    def __init__(self, N, width, height):
        self.grid = MultiGridWithProperties(width, height, True)
        
        # Setting properties
        for x in range(width):
            for y in range(height):
                color = 'red' if (x + y) % 2 == 0 else 'blue'
                self.grid.set_cell_property((x, y), 'color', color)

        # Getting a property
        specific_cell_color = self.grid.get_cell_property((2, 2), 'color')
        
        # Getting cells with a specific property
        red_cells = self.grid.get_cells_with_property('color', 'red')

Option 2: Mixin Class

Implementation:

Create a Mixin class that can be combined with any existing grid class to add cell properties functionality.

Advantages:

  • Offers greater flexibility and modularity.
  • Adheres to the Open/Closed Principle of software design.
  • The original Mesa grid classes remain unmodified.

Disadvantages:

  • Slightly more complex implementation.
  • Requires understanding of Python Mixins and multiple inheritance.

Usage Example:

from mesa.space import MultiGrid

class CellPropertiesMixin:
    def __init__(self, width, height, *args, **kwargs):
        super().__init__(width, height, *args, **kwargs)
        self.cell_properties = {(x, y): {} for x in range(width) for y in range(height)}

    def set_cell_property(self, pos, property_name, value):
        self.cell_properties[pos][property_name] = value

    def get_cell_property(self, pos, property_name):
        return self.cell_properties[pos].get(property_name)

    def get_cells_with_property(self, property_name, value):
        return [pos for pos, properties in self.cell_properties.items() if properties.get(property_name) == value]

class MultiGridWithProperties(CellPropertiesMixin, MultiGrid):
    pass

# Usage Example
class MyModel(Model):
    def __init__(self, N, width, height):
        self.grid = MultiGridWithProperties(width, height, True)
        
        # Setting properties
        for x in range(width):
            for y in range(height):
                color = 'red' if (x + y) % 2 == 0 else 'blue'
                self.grid.set_cell_property((x, y), 'color', color)

        # Getting a property
        specific_cell_color = self.grid.get_cell_property((2, 2), 'color')
        
        # Getting cells with a specific property
        red_cells = self.grid.get_cells_with_property('color', 'red')

Discussion

I would like feedback on this proposal! I have a few main discussion points:

  1. Do you agree that it would be useful to have Grid classes with cells that can have certain properties, aside from only their location?
  2. In NetLogo patches are actual agents, just stationary. You can do ask patches. This proposal doesn't go this far, as they only have properties and no functions. Should we consider allowing cells to have functions or behaviors, similar to NetLogo's approach?
  3. SingleGrids already have the concept that a cell can be empty. Should this just become a cell property, just like that a cell can be a color or have a name?
  4. Could we eliminate the distinction between SingleGrids and MultiGrids by introducing a 'capacity' property for cells, defining the number of agents a cell can accommodate? This change would imply that an 'empty' cell has zero agents, and 'places left' could be calculated as capacity - len(agents).
    (This approach aligns with Python's interpretation of numeric values in boolean contexts, where 0 is False (equivalent to 'empty') and any non-zero value is True (indicating 'places left'. So you can just do if places_left:)
  5. Grids are mostly used for spatial relationships and movement. Ideally, I would like to extend all functions to be able to filter or select based on those properties. So for example:
  • Move to a cell that's a certain property (like color == "red")
  • Get empty cells with a certain property (like outside == True)
  • get_neighborhood of cells with a certain property (empty == False)
    Do you think all functions should be able to filter on a property value?
  1. For now I have excluded the ContinuousSpace class, because there are no discrete grid cells. In NetLogo however, patches have a discrete (square) size, but space is still continuous for agents. Should we offer some hybrid concept?

@jackiekazil @tpike3 @Corvince @rht very curious what you think!

To provide my take on these questions:

  1. Yes, it would be useful for grid cells to have properties
  2. No, I think cells should be passive, and thus just have properties that can be modified by the model or agents. They should have properties and utility functions. So there we divert from NetLogo.
  3. Yes, empty can just become an property
  4. Maybe, we will break backwards compatibility but it would clean up a lot.
  5. Definitely, this will be one of the may big advantages of this proposal.
  6. This is a complicated question, and I don't know. In NetLogo, patches would have a certain size, and agents can be anywhere on that patch (their location is a float). We could maybe make a separate grid from that, or implement it in another way, but maybe keep it out of scope for this proposal.
rht commented

The current method is to use patch agents, e.g. in the Sugarscape example, https://github.com/projectmesa/mesa-examples/blob/10985d44091b9ba1ecebd013d2d2252e2116649b/examples/sugarscape_g1mt/sugarscape_g1mt/model.py#L92-L105. Patch agents are more general than cell properties in that they are FSM, and they have simple abstraction over how they work, without the need of additional documentation. However, they don't scale to a cell containing lots of patch types. To get the sugar agent in a cell, a loop over the cell content is needed: https://github.com/projectmesa/mesa-examples/blob/10985d44091b9ba1ecebd013d2d2252e2116649b/examples/sugarscape_g1mt/sugarscape_g1mt/trader_agents.py#L53-L62. But then again, this is solvable by combining the Sugar, Spice, ... objects into 1 patch object that may have the amount of water, plant nutrients, temperature, sunlight, etc, and to place them in a separate SingleGrid layer.

Thanks for your reply, and the additional context.

I think one nut NetLogo cracked very successfully, is how easy and intuitive working with a spatial environment is. One of those aspects is how easy patches can be used and modified. In Mesa I haven't seen that replicated. Especially just being to able to ask something like move-to one-of patches with [empty? and pcolor = red] is of tremendous value. In Mesa this is now non-trivial.

rht commented

I can see this implemented as combining the existing agent_space.move_agent(agent, pos), with a new method find_a_position pos = patch_space.find_a_position(lambda pos: agent_space.is_cell_empty(pos) and patch_space[pos].pcolor == "red"). This is problematic:

  • it is more verbose than NetLogo syntax where you can specify a variable in the local context at any given time (pcolor), but the drawback with the NetLogo syntax is that it is too implicit.
  • you have to consciously specify which space to operate on (agent_space or patch_space)

One option to be in line with NetLogo would be to define a class GridWithPatch, where SingleGrid is a GridWithPatch with a container (the "unit" of the space) of type Agent | None, MultiGrid is a GridWithPatch with a container type of list[Agent]. Then you can define a Patch class

class Patch:
    def __init__(self, ...):
        self.agents = []
        self.pcolor = ...
        self.sugar = ...

    def step(self):
        ...

With this, you can reuse the Patch class for a GridWithPatch that has a coordinate structure of 1D, 2D, triangular, hexagonal, and so on.

edit1: rename GridWithContainer to GridWithPatch
edit2: add an optional step() method to Patch, so that it still counts as a FSM

The main point I'm thinking about is that it would be nice to be able to add as much built-in functionality to patches as possible. It should be really intuitive to ask thinks to patches, move to an empty one, get their neighbours, see how many neighbours have some characteristics, etc.

Don't know yet how we get there do. How would a Patch as an object fit in that?

It should also still be somewhat performant, especially searching for one/all empties or with some characteristic.

rht commented

I agree that NetLogo is very expressive, with code that reads like an VSO natural language, while still being OOP (but OOP in the Smalltalk message-passing sense, which is different than most OOP languages). Implementing the Patch object can be a first step to prepare for the functions. I'd still need to do some reading to find out whether you can have such expression in Python. At the very least, you can chain functions without having to write loops.

It should also still be somewhat performant, especially searching for one/all empties or with some characteristic.

NetLogo is actually often faster than Mesa, in this benchmark.

Edit: Replace SVO with VSO

Thanks for this interesting proposal! Having cell properties would be greatly useful for gis models. Our rainfall and urban_growth models are two examples using cell properties.

Currently mesa-geo has classes for raster layers, which contains cells: https://github.com/projectmesa/mesa-geo/blob/e10761e30bef509ea270a96e39decc0d97fc1318/mesa_geo/raster_layers.py#L165. Here cells are essentially just agents. So it is a bit different from your proposed implementation.

If Mesa has support for cell properties, then Mesa-Geo could probably have a simpler implementation by inheriting directly from Mesa.

Thanks for taking a interest @wang-boyu! Nice to hear that more people would find something like this useful.

I just had an idea: Why not just add a NumPy 2D array as a property layer?

Could be something like this:

from mesa.space import MultiGrid
import numpy as np

class MultiGridWithArrayProperties(MultiGrid):
    def __init__(self, width, height, torus):
        super().__init__(width, height, torus)
        self.properties = {}

    def add_new_property(self, property_name, default_value=None):
        """Add a new property with a default value."""
        self.properties[property_name] = np.full((self.width, self.height), default_value)

    def remove_property(self, property_name):
        """Remove a property."""
        del self.properties[property_name]

    def set_property_cell(self, x, y, property_name, value):
        """Set the value of a specific property for a cell."""
        self.properties[property_name][x, y] = value

    def get_property_cell(self, x, y, property_name):
        """Get the value of a specific property for a cell."""
        return self.properties[property_name][x, y]

    def set_property_all_cells(self, property_name, value):
        """Set a property for all cells to a new value."""
        self.properties[property_name][:] = value

    def modify_property_all_cells(self, property_name, operation):
        """Modify a property for all cells with a Python or Numpy operation."""
        self.properties[property_name] = operation(self.properties[property_name])

    def set_property_conditional(self, property_name, value, condition):
        """Set a property for all cells that meet a certain condition."""
        self.properties[property_name][condition(self.properties)] = value

    def modify_property_conditional(self, property_name, operation, condition):
        """Modify a property for all cells that meet a certain condition with a Python or Numpy operation."""
        condition_met = condition(self.properties)
        self.properties[property_name][condition_met] = operation(self.properties[property_name][condition_met])

    def get_property_value_all_cells(self, property_name):
        """Get all values for a property.""")
        return self.properties[property_name]

    def get_cells_with_multiple_properties(self, conditions):
        """Get positions of cells that meet multiple property conditions."""
        # Initialize with the condition of the first property
        first_property, first_value = next(iter(conditions.items()))
        combined_condition = self.properties[first_property] == first_value

        # Apply logical AND for each subsequent condition
        for property_name, value in conditions.items():
            if property_name != first_property:
                combined_condition &= self.properties[property_name] == value

        return list(zip(*np.where(combined_condition)))

    def aggregate_property(self, property_name, operation):
        """Perform an aggregate operation (e.g., sum, mean) on a property across all cells."""
        return operation(self.properties[property_name])

Then you can do:

conditions = {
    "color": "red",
    "empty": True
}
red_and_empty_cells = grid.get_cells_with_multiple_properties(conditions)

and a lot more.

Advantage is that everything is NumPy, nothing is looped, and everything is vectorized. Should work fast for all size grids.

Disadvantage would be that it only works on rectangular grids, so not on the Hex grids.

Yes having values as numpy arrays would be helpful. In fact raster layers have functions to extract cell values as numpy array, and apply numpy array to cell values: https://github.com/projectmesa/mesa-geo/blob/e10761e30bef509ea270a96e39decc0d97fc1318/mesa_geo/raster_layers.py#L324-L372. The function name apply_raster comes from netlogo's gis extension: https://ccl.northwestern.edu/netlogo/docs/gis.html#gis:apply-raster.

But these numpy arrays are constructed only when the functions are called. It would be more efficient to store everything as numpy arrays as you mentioned above.

I'm wondering how this links to the Cell class, with each cell having its own states (attributes). Cells can be viewed as agents, with their step() functions, and can be added and managed by data collectors and schedulers. This is essentially about building cellular automata (CA) models. Not sure whether this is related to your proposal though, and maybe it's not. I think you are trying to make agents (e.g., animals, people) behave according to cell properties, whereas I'm thinking about cells themselves becoming agents.

Thanks for your take! I thought about it some more, and I think we need two types of patches:

  1. A layer of patches that's passive and only has some property that can be read and modified. It should be incredibly fast to allow operations executed from the agent step (which is executed so many times). We could call that as a PassiveGrid or GridWithProperties. For completeness, I think we should integrate agent movement / count / emptiness as one of the layers (so you can move to empty, get neighbour agents, etc.)
  2. A proper Patch (or Cell) Agent. This is a grid that has an actual, active agent on each patch, which can have a step function and could modify other patches. I think we can just use regular agents for this, and make an helper method initialize_with_patch_agents, which places a simple Patch agent on each grid cell. It might be slower but will be more powerful. And it would work with Hexgrids.

One thing I'm struggling with a bit is a strict definition of how agents fill a space. Earlier, I imagined grid cells having a capacity, of how many agents it would hold.

With one agent type, that's a very easy and straightforward model: a capacity of n holds n agents. However, as you get multiple agent types, that get's complicated. Is there a capacity for each agent type, of one total capacity? If the latter, does each agent take up the same space?

The conceptual model I now have in my heads says you have two components, of which you need either one of or both:

  • A capacity per agent type (this can be a dict with agent types as keys and capacity as values)
  • A shared capacity, of which each agent takes up some amount in

But maybe there might even be an interaction effect between agents (of different types): If some agent type is there, I might want to really be there, but if there is another type, no way I'm going there. Especially with biology and social models that could be the case.

(to be fair, I think this is definitely to complicated for a native mesa implementation)

So why is this capacity important? Primarily, because it is needed to get the options to which agents can move. However, if you have to check a total capacity and a capacity for that agent type every time, it could complicate both the code (and thus maintenance, scalability, etc.) and could reduce performance.

I'm trying some solutions and implementations, but might be overthinking this. If anyone can help me simplify this or narrow it down, that would be very helpful.

Edit: To formulate the concrete situation:

  1. It's useful to have properties for grids
  2. Capacity feels like a logical native property, because you then now if another agent can move there
  3. In a SingleGrid capacity=1, in a MultiGrid capacity=n (which can be infinite)
  4. So for a MultiGrid , it's more useful to know if there is still place than if it's completely empty or not
  5. However, capacity becomes complicated with multiple agent types.

Practically there are now three problems, that can probably be best solved in this order:

  1. Supporting multiple agent types natively
  2. Updating empty / capacity constructs
  3. Adding cell properties to them

I did some benchmarks, for empty and capacity cells it isn't faster most of the time to use a 2d array, because many calls are individual writes anyway. For the properties this will be different, since you could ask all cells to increase their values. So these two problems can be split.

Awesome discussion here. Cells and properties and how to implement them is something I have been thinking about a lot over the last couple of years. It really is a hard problem and I haven't found a good solution yet.

There has been a discussion about a layered architecture here: #643 (comment) , which I think is another interesting way to look at it. That could be a way to combine a fast numpy properties layer with a traditional agents layer.

RE capacity: I wouldn't overthink things. In my mind MultiGrid (with unlimited capacity) is always the "default mode" and SingleGrid is a special (but common) case. Conceptually, that is, I know its implemented differently.
Every thing else I would consider application (in contrast to library) logic that should be handled on a per use basis. That is I think it is sufficient to provide the building blocks and then its not that hard to implement capacity on the model level, where you have full control over how it is modelled.

Something thats vaguely related to this thread and the one about data collection:

We currently expose and make strong use of the "unique_id" of agents. However, we have no control about the actual uniqueness of ids and how they are used (integers, strings or something completely different). We could maybe circumvent this by using id(agent), which gives us a simple integer as unique id. I am writing this here, because there might be some computational benefits of for example storing only the ids in a numpy array. But honestly this is some half-knowledge, so maybe its not helpful at all.

Thanks for your insights Corvince!

There has been a discussion about a layered architecture here: #643 (comment) , which I think is another interesting way to look at it.

It’s insane how there are so much insightful discussions hidden all over this repository. This is another treasure of one.

I have been thinking about layers, height/elevation and 2.5D spaces as well. On practical example case would be that you can have a soil layer, a plant layer and an air layer.

To solve this problem properly we might need a good conceptual model of what layers there can or should be in an ABM.

I wouldn't overthink things.

Thanks. capacity=n would probably be the most logical approach, where 1 and inf are special cases.

I think one agent of each type per cell is also still a common case. But maybe we can use layers for that.

We currently expose and make strong use of the "unique_id" of agents.

Was indeed tinkering with this. Using ints is indeed faster and more memory efficient for manipulating NumPy arrays, but you need a translation step to get the agent again (dict, func, whatever) which makes it slower again in most cases.

Thanks for your insights, especially multiple layers of grids could help with the capacity problem!

Okay, the next issue I’m contemplating is how properties and cell layers should link together. Should they be linked one-to-one (each layers has its own properties)? If so, how do those layers communicate though them.

Think I'm getting there:

  • Agents move over one grid. That grid has a capacity. That capacity will be shared between all agents of that grid.
  • Each grid can have one or more property layers linked to them. A property layer can be linked to one, multiple or all grids.
  • Since agents on different grids can be influenced by and modify the same property layer, information and emergent behaviour can pass though those layers.

Edit: Nice thing is that this approach also completely separates the three problems (multiple-agents, flexible capacity and cell properties).

I have an initial implementation of a PropertyLayer class with all the functionality and a _PropertyGrid class which SingleGrid and MultiGrid will use.

Still work in progress, but check it out in: #1898

+1 to this being a great discussion. A few thoughts...

  1. I think we need to formalize the way we do major changes like this to make sure we have alignment. I think one of the issues we have had in the past is when people get excited do work and then a change isn't accepted. Eg - Like Python PEP's . With the length of this discussion it becomes more and more difficult to track for folks not tracking what is happening and why X and not y.

  2. I like where this is headed. ... I am a fan of the following 1) ease of use either simplicity or mentally -- agentpy seems to have done a few good things here and also focusing on the standards that netlogo has established. 2) Speed - I will always want to improve this. ... The real conflict occurs when 1 & 2 come into conflict with each other in some significant way.

Agreed on the PEP. We should make some template / guidelines for that.

Edit: Also, discussion/issue --> MEP --> implementation is probably the right order. I think sometimes you need an implementation to see how it works, but in many cases something like a formal MEP does help with that.

I like that order.

I'm going to close this issue as completed!

  • #1898 was merged
  • Feedback is collected and discussed in #1932
  • "active" grid cells are discussed in #1900.
    Thanks everyone!