Git history on trees and blobs is slow.
redrabbit opened this issue · 1 comments
When showing the history of a given tree or blob (see #31), things can get pretty slow. Commit 81a75c0 helped marginally but the computation of such an action remains very intensive.
Testing with the current HEAD
of elixir-lang/elixir (~15.000 commits):
- Rendering the paginated history of
master
take ~140ms. - Rendering the paginated history of
master
with a pathspec takes ~6.000ms.
Fetching the history for a given revision is done relatively fast. In contrast, having to filter for a given pathspec takes forever.
Both methods could be greatly improved if pagination does not require to compute the entire revwalk (we currently need to Enum.count/1
the entire revwalk to show the last available page).
Instead we should provide a pagination similar to one specified by Relay Cursor Connections. We loose the possibility to show the number of pages, in return we can start the revwalk from a given commit and Stream.take/2
a small number of commits.
Commit a20d335 adds a significant boost. With a big repository we can go from >6s down to 150ms.
But this also depends a lot on which blob/tree is used.
Explanation
The history of a file being changed frequently will render much faster than a file changed more rarely.
For a Git commit history of a tree or blob the cost intensive part is that we have to filter each commit to ensure that the given tree/blob has changed. This means
- Loading the tree of the commit and its parent.
- Diffing both trees.
- Matching the given pathspec.
Having to process the entire 15.000 commits takes relatively long. The new paginate_cursor/5
function helps by skipping as much commits as possible...
Implementation
Here's a basic usage example for paginating commit history:
page = paginate_cursor(@conn, @commits, &(oid_fmt(&1) == &2), &oid_fmt(&1.oid))
The performance improvement is due to the fact that
- We don't go through the entire list of items. Instead we skip (
Enum.drop_while/2
) items until we match (third param) the cursor. Having the stream's:enum
trimmed until cursor. - We
Stream.take/2
until we aggregatelimit
+1 items (or reach the end of the stream). - We create a new cursor (fourth param) pointing at the previous/next page item.
They are also a few caveats:
- We use the
:enum
field ofStream{}
to drop items untilfn_filter
returnstrue
.- This is a bit of a hack because it relies on
GitRekt.Git.enumerate/1
to resolve the first layer (NIF) of the stream. We use this to make sure that libgit2 iterators behave more Erlangish.
- This is a bit of a hack because it relies on
- If we paginate the history of a file that is modified relatively often, we will aggregate the
limit
items quiet fast. But if we have a file that is rarely modified, we might have to process a very big chunk of the full commit list in order to create that page oflimit
items.