/git-commit-graph-ext

Extensions to Git's commit-graph feature, including reachability indexes / labels for the DAG of revisions

Primary LanguageJupyter NotebookGNU General Public License v2.0GPL-2.0

Binder Gitpod ready-to-code

Commit graph labeling for speeding up Git commands

Explore possible extensions to the serialized commit-graph format in Git, including adding reachability indexes for the graph of revisions

Git uses various clever methods for making operations on very large repositories faster, from bitmap indices for git fetch[1], to generation numbers (also known as topological levels) in the commit-graph file for commit graph traversal operations like git log --graph[2].

The goal of this project is to examine various possible improvements that can make Git even faster, other than just using generation numbers. For example there are many methods to make reachability queries in very large graphs faster; it remains to be seen if they would work on large commit graphs (the graph of project history) as well as they work on other real-life graphs.

Ultimately, this project is about examining extension to Git's commit-graph feature, including mainly adding reachability indexes / labels for the DAG (Directed Acyclic Graph) of revisions.


This project, while mainly exploratory in nature, is using nbdev library for literate programming in Python using Jupyter Notebooks -- not to create a Python module (to publish in PyPi and/or Conda), but to allow for splitting it up.

The original notebook at Google Colaboratory: "Reachability labels for version control graphs.ipynb" got quite unwieldy; it takes too much time to run it. By splitting it into smaller notebooks, and turning the code into helper modules, the hope is that it should be possible to quickly go to the interesting parts of exploration.

Graph operations in Git

This project focuses on the Git operations that involve examining and walking the commit graph, i.e. the project history. Such operations include:

  • git merge-base --is-ancestor
  • git branch --contains
  • git tag --contains
  • git branch --merged
  • git tag --merged
  • git merge-base --all
  • git log --topo-sort

Only the first command performs straight reachability query.

Table of contents

Warning: this is not generated automatically

  1. Graphs in Python
  2. Related works: various reachability labellings
  3. Example directed graphs
  4. Drawing graphs
  5. Reachability index
  6. Topological levels
  7. DFS intervals labelling
  8. Reachability queries
  9. Extracting commit graphs from Git repositories
  10. Checkpointing
  11. Graph datasets
  12. Large Git repositories
  13. Graph stats
  14. Reachability evaluation

Slides

This topic is covered in much more details in slides for the presentation "Graph operations in Git version control system" by Jakub Narebski (2019).

Those slides are also available to read on SlideShare and on Speaker Deck:

TODO: Notebooks to split