Build a graph database of nixpkgs.
This project aims at building a graph database of nixpkgs
.
Read more on our blog post: "Construction and analysis of the build and runtime dependency graph of Nixpkgs".
./build.sh 481f9b246d200205d8bafab48f3bd1aeb62d775b 0n6a4a439md42dqzzbk49rfxfrf3lx3438i2w262pnwbi3dws72g
where
- the first argument is the revision (the 40-character SHA-1 hash) of a commit
- the second is the SHA256 hash of its content (same as
nix-prefetch-url --unpack
).
After running this script you will find in the ./rawdata/
folder:
nodes.json
: raw data extracted with the Nix evaluationnodes.csv
: structured data which can be loaded by most toolsfirst_graph.png
: image drawn with networkxfirst_graph.gexf
: data which can be loaded by Gephifirst_graph.grapgml
: data which can be loaded by Neo4jgeneral_info.json
: some basic information (number of nodes, number of edges)
If you want to query the graph with Neo4j using Cypher Shell, a shell.nix
is provided:
$ nix-shell
[nix-shell]$ cypher-shell -a bolt://localhost:7687 "MATCH (n) RETURN COUNT(n) as number_of_nodes;"
-
The provided Nix shell also create a Python virtual environment:
nix-shell --command "exit" source .venv/bin/activate
-
Run
nixpkgs_graph
in the command line:python3 -m nixpkgs_graph --help
To get the nixpkgs database in json format, you can use the following code:
python3 -m nixpkgs_graph build --rev 481f9b246d200205d8bafab48f3bd1aeb62d775b --sha256 0n6a4a439md42dqzzbk49rfxfrf3lx3438i2w262pnwbi3dws72g
The
-rev
flag means revision, which is the 40-character SHA-1 hash of a commit. And-sha256
is its SHA256 hash. -
Generate the graph and do some basic analysis:
python3 -m nixpkgs_graph generate-graph --input-file INPUT_FILE --output-folder OUTPUT_FOLDER
The input file should be the path to the data extracted in the previous step.
-
To use Neo4j to query the graph:
-
Find the
.graphml
format file in the output folder. -
Copy it to the import folder of Neo4j
$NEO4J_HOME/share/neo4j/import/
. -
Clear the original graph to avoid duplication:
cypher-shell -a bolt://localhost:7687 "MATCH (n) DETACH DELETE n;"
-
Use
APOC
to import it:cypher-shell -a bolt://localhost:7687 "call apoc.import.graphml('<filename>.graphml', {})"
Or in Neo4j browser if you use desktop version:
call apoc.import.graphml('<filename>.graphml', {})
-
-
Use some simple commands to test if the graph is successfully imported:
cypher-shell -a bolt://localhost:7687 "MATCH (n) RETURN n LIMIT 10;"
Distributed under the MIT License. See LICENSE.txt
for more information.
Eloi Xuan WANG - @GearlessJohn - xuan.wang@polytechnique.edu
Guillaume Desforges - @GuillaumeDesforges - guillaume.desforges@tweag.io
Project Link: https://github.com/tweag/nixpkgs-graph
The following are details about the methods used.
Each name/value pair in the JSON file represents a package under nixpkgs
, and it contains the following information :
id
: full name with version of the package undernixpkgs
,pname
version
package
: path to which the package belongs (like[ nixpkgs python3Package ]
)buildInputs
of the package in which each buildInput has the/nix/store/hash-name(-dev)
structure, so we can identifier the node byname
.propagatedBuildInputs
of the package in which each propagatedBuildInput has also the/nix/store/hash-name(-dev)
structure, so we can still identifier the node byname
.type = "node"
which is used as an identification marker for lib.collect
Example :
{
"buildInputs": "/nix/store/c1pzk30ksbff1x3krxnqzrzzfjazsy3l-gsettings-desktop-schemas-42.0 /nix/store/mmwc0xqwxz2s4j35w7wd329hajzfy2f1-glib-2.72.3-dev /nix/store/64mp60apx1klb14l0205562qsk1nlk39-gtk+3-3.24.34-dev /nix/store/6hdwxlycxjgh8y55gb77i8yqglmfaxkp-adwaita-icon-theme-42.0 ",
"id": "chromium-103.0.5060.134",
"package": [
"nixpkgs",
"chromium"
],
"pname": "chromium",
"propagatedBuildInputs":"",
"type":"node",
"version": "103.0.5060.134"
}
and another example of depth 1 under python3Packages
:
{
"buildInputs": "/nix/store/vakcc74vp08y1rb1rb1cla6885ayklk3-zstd-1.5.2-dev ",
"id": "python3.9-zstd-1.5.1.0",
"package": [
"nixpkgs",
"python3Packages",
"zstd"
],
"pname": "zstd",
"propagatedBuildInputs":"/nix/store/xpwwghl72bb7f48m51amvqiv1l25pa01-python3-3.9.13 ",
"type":"node",
"version": "1.5.1.0"
}
To get this data, we evaluate a Nix expression designed to yield all the data we want.
Note that we use --json --strict
when calling nix-instantiate
.
The Nix expresison iterates on the key/value pairs of the root attribute set of nixpkgs (and some other selected attribute sets) using mapAttrs. Afterwards, we retrieve the desired sets via lib.collect.
For the first version of the graph, we used pandas to process the raw JSON data and networkx to process the graph data.
See nixpkgs_graph.py
Use the networkx.read_gexf()
function to read the .gexf
file.
This project provides some basic infomatation:
- number of nodes
- number of edges
- top 10 nodes which have the largest number of dependencies
- top 10 most cited nodes
- average number of dependencies of a derivation
- cycles in the nixpkgs graph
- length of the longest path in the graph
Use Gephi
to read and process the generated .gexf
for visualization.