Collaborative object stress test

This repository is a tool for exploring how Radicle collaborative objects perform when there are large numbers of objects. There were two questions I was initially interested in:

  • How much space overhead does the encoding of automerge changes into git merkle trees introduce?

  • How long does it take to retrieve all the issues of a given type?

To answer these questions I needed a test dataset. In order that this data be somewhat realistic I download issues from a selected repository in github and then import these issues into a simplified monorepo.

This is all reified as a command line tool which implemented two separate steps, one to download issues from a Github repo and one to import those issues into a monorepo. You can read more about the specific commands below.

First results

I downloaded 10700 issues from the facebook/react repository (some of them having several hundred comments) and imported them into a local monorepo. The import process itself took around 2.5 hours, this isn’t enormously interesting in this example as it’s unlikely any project is going to generate thousands of issues in a short amount of time, but it’s worth noting that creating objects is not cheap - faster moving data may need to wait on performance improvements.

Note that the raw JSON of the issues was 67M.

After importing issues into an otherwise empty git repository the repository consumed around 1Gb. After running git gc this shrinks to around 90M.

However, retrieving all the issues in the repository took around 6 minutes.

Caching

After implementing caching in the cob library the retrieval operation shrinks to around 3 seconds (after building the cache the first time around). There’s no obvious single culprit here, it’s a lot of different small allocations and IO operations. Undoubtedly it could be improved but it’s probably not worth spending time on just yet as most projects will not be this large for quite some time.

CLI

Download issues

First we need to download issues from Github.

> collab-stress-test --token-file ./PERSONAL_TOKEN download-issues automerge/automerge-rs (1)
> tree ./data/
└── automerge
    └── automerge-rs
        ├── download
        │   ├── issues
        │   │   ├── 618182998.json
        │   │   ├── 627192368.json
        ...
  1. --token-file is the path to a file containing a github personal access token

By default this tool uses a data directory in $CWD/data. For each github repository there is a directory in the data directory under owner/name. Downloaded issues are saved in $data/owner/name/download. Above you can see there is one json file per issue.

Import Issues

> collab-stress-test import-issues automerge/automerge-rs
> tree ./data/
./data/automerge/automerge-rs/monorepo
├── git
│   ├── config
│   ├── description
│   ├── HEAD
│   ├── hooks
|   |   ...
│   ├── info
│   │   └── exclude
│   ├── objects
│   │   ...
│   └── refs
│       ├── heads
│       ├── namespaces
│       │   └── hnrkxb6o83t9b8ukwboxh5yoxijgbfjww3y6y
│       │       └── refs
│       │           └── remotes
│       │               ├── hyb1jukxajb5k1nf8mna4jpz1rdqsazybr3pm6tt5qacr66r64m9un
│       │               │   └── cob
│       │               │       └── xyz.radicle.githubissue
│       │               │           ├── 43b0d8816cd863b739f65363b54893efbede83b2
│       │               │           └── 8456bb5baadb8066478a97190ab15b8478813f4a
│       │               ├── hybbnun8qz6znu71yfesn77tnjxggw1bgjc6x71fny9r1kofqykrja
│       │               │   └── cob
│       │               │       └── xyz.radicle.githubissue
│       │               │           ├── 02569611763dda0ae9493078cab0b332b32d0f53
│       │               │           ├── 2e3e308006b082dc01618b9700ae8b75448d6513
|       |               ...
│       └── tags
├── peer_identities
├── peer_map
├── peers
│   ├── hyb1jukxajb5k1nf8mna4jpz1rdqsazybr3pm6tt5qacr66r64m9un
│   ├── hybbnun8qz6znu71yfesn77tnjxggw1bgjc6x71fny9r1kofqykrja
|   ...
└── project_oid

This creates a monorepo under $data/owner/name/monorepo. This is not the same monorepo as the monorepo used by librad, but it’s similar enough that the performance characteristics are the same. For more information see src/lite_monorepo.rs.

Count imported issues

collab-stress-test count-imported-issues facebook/react

This tests the time to load the entire data set. You can pass --no-cache to perform this operation without using the cache.

Show a particular issue

If you know the object ID Of an issue (which you can get by looking at the refs in the monorepo or the directory names in the cache) you can running

collab-stress-test retrieve-issue   facebook/react <object id>

Get change graph info

As above, if you know the object ID you can get additional information on the change graph of an issue.

collab-stress-test issue-change-graph-info facebook/react <object ID>

The --just-graphviz flag for this command can be used to output a graphviz representation of the change graph to standard output.