Factorio (homepage, Steam) is a fantastic top-down open-world RTS/factory game with an emphasis on automation and technical depth. There's no game I can't ruin by applying math!
The purpose of this project is to get some analytical data about the game's economics, and probably do some constrained linear programming optimization to try out some "computed economies" at scale.
The scripts don't interact with the game itself. Instead, they pull and scrape data from the wiki, an instance of MediaWiki. As luck would have it, MediaWiki has a pretty good REST API, and the API endpoint for the Factorio wiki is open. This means that we can:
- Pull structured infobox content for each item in the game, yielding data like production recipes, power consumption, etc. This is done by pulling content for the most recent revision of every page in the Infobox category.
- Pull titles for each page in the Archived category, so that we know which items to ignore.
The wiki has two instances - one documenting the experimental version of Factorio (0.17 as of this writing), and the other documenting the stable version (0.16). This project uses the stable version.
To download and save the game's item database, issue
./pull-recipes.py
To look at the resulting (large) JSON file, use xzless
.
The data from the wiki are a colourful mix of inaccurate, incomplete, inconsistently presented, and in a format not conducive to analysis. The preprocessing step attempts to fix that:
./preprocess.py
To look at the resulting (>5MiB) CSV, use xzless -S
, unwrapped lines being
crucial here. As you'll notice, the data are extremely sparse, but let's not
overcomplicate things by switching to a sparse format when xz
already has a
0.3% (!) compression ratio.
The CSV is for consumption by R. In addition, a NumPy zipped matrix (.npz) and
metadata JSON file are created, for consumption by analyse.py
.
analyse.r
is a stub; you can hack on it if you want to manipulate the recipe
matrix. Currently, though, I'm moving toward using analyse.py
and SciPy.
SciPy has a lot of options for numerical optimization. We're interested in linear programming via linprog :
- Minimize c * x
- Subject to
- A_ub * x <= b_ub
- A_eq * x == b_eq
- lb <= x <= ub
Variables:
c
is a 1*n row vector of coefficients of the linear objective function.x
is an n*1 column vector of linear objective function variables to be minimized.A_ub
is a p*n matrix of upper bound constraints.b_ub
is a p*1 column vector of upper bound constraints.A_eq
is a q*n matrix of equality constraints.b_eq
is a q*1 column vector of equality constraints.bounds
is either one (min, max) tuple applying to all variables inx
, or a sequence ofn
(min, max) variables for each variable inx
.
In our case:
n
is equal to 681, the number of recipes- Each entry in
x
is a floating-point number representing the (potentially partial-use) quantity of that recipe in use; i.e. "five assembler machine 3 producing copper coil" - The
min
portion ofbounds
must be zero; i.e., we cannot have a negative number of recipes c
must be calculated based on the requested objective function. This can be influenced by pollution, area, etc.A_ub
must take into account the total number of whole or partial manual tasks summing to 1. It must also take into account the fact that no resource rate can go below 0 for a sustainable process.
To calculate c
, the objective function coefficient 1n row vector, multiply an
expense r1 column vector by the recipe r*n matrix, r
being the number of
resources. The expense vector has one entry per resource.