This repository is a tool for exploring how Radicle collaborative objects perform when there are large numbers of objects. There were two questions I was initially interested in:
-
How much space overhead does the encoding of automerge changes into git merkle trees introduce?
-
How long does it take to retrieve all the issues of a given type?
To answer these questions I needed a test dataset. In order that this data be somewhat realistic I download issues from a selected repository in github and then import these issues into a simplified monorepo.
This is all reified as a command line tool which implemented two separate steps, one to download issues from a Github repo and one to import those issues into a monorepo. You can read more about the specific commands below.
I downloaded 10700 issues from the facebook/react
repository (some of them
having several hundred comments) and imported them into a local monorepo. The
import process itself took around 2.5 hours, this isn’t enormously interesting
in this example as it’s unlikely any project is going to generate thousands of
issues in a short amount of time, but it’s worth noting that creating objects is
not cheap - faster moving data may need to wait on performance improvements.
Note that the raw JSON of the issues was 67M
.
After importing issues into an otherwise empty git repository the repository
consumed around 1Gb
. After running git gc
this shrinks to around 90M
.
However, retrieving all the issues in the repository took around 6 minutes.
After implementing caching in the cob
library the retrieval operation shrinks
to around 3 seconds (after building the cache the first time around). There’s no
obvious single culprit here, it’s a lot of different small allocations and IO
operations. Undoubtedly it could be improved but it’s probably not worth
spending time on just yet as most projects will not be this large for quite some
time.
First we need to download issues from Github.
> collab-stress-test --token-file ./PERSONAL_TOKEN download-issues automerge/automerge-rs (1)
> tree ./data/
└── automerge
└── automerge-rs
├── download
│ ├── issues
│ │ ├── 618182998.json
│ │ ├── 627192368.json
...
-
--token-file
is the path to a file containing a github personal access token
By default this tool uses a data directory in $CWD/data
. For each github
repository there is a directory in the data directory under owner/name
.
Downloaded issues are saved in $data/owner/name/download
. Above you can see
there is one json file per issue.
> collab-stress-test import-issues automerge/automerge-rs
> tree ./data/
./data/automerge/automerge-rs/monorepo
├── git
│ ├── config
│ ├── description
│ ├── HEAD
│ ├── hooks
| | ...
│ ├── info
│ │ └── exclude
│ ├── objects
│ │ ...
│ └── refs
│ ├── heads
│ ├── namespaces
│ │ └── hnrkxb6o83t9b8ukwboxh5yoxijgbfjww3y6y
│ │ └── refs
│ │ └── remotes
│ │ ├── hyb1jukxajb5k1nf8mna4jpz1rdqsazybr3pm6tt5qacr66r64m9un
│ │ │ └── cob
│ │ │ └── xyz.radicle.githubissue
│ │ │ ├── 43b0d8816cd863b739f65363b54893efbede83b2
│ │ │ └── 8456bb5baadb8066478a97190ab15b8478813f4a
│ │ ├── hybbnun8qz6znu71yfesn77tnjxggw1bgjc6x71fny9r1kofqykrja
│ │ │ └── cob
│ │ │ └── xyz.radicle.githubissue
│ │ │ ├── 02569611763dda0ae9493078cab0b332b32d0f53
│ │ │ ├── 2e3e308006b082dc01618b9700ae8b75448d6513
| | ...
│ └── tags
├── peer_identities
├── peer_map
├── peers
│ ├── hyb1jukxajb5k1nf8mna4jpz1rdqsazybr3pm6tt5qacr66r64m9un
│ ├── hybbnun8qz6znu71yfesn77tnjxggw1bgjc6x71fny9r1kofqykrja
| ...
└── project_oid
This creates a monorepo under $data/owner/name/monorepo
. This is not the same
monorepo as the monorepo used by librad
, but it’s similar enough that the
performance characteristics are the same. For more information see
src/lite_monorepo.rs
.
collab-stress-test count-imported-issues facebook/react
This tests the time to load the entire data set. You can pass --no-cache
to
perform this operation without using the cache.
If you know the object ID Of an issue (which you can get by looking at the refs in the monorepo or the directory names in the cache) you can running
collab-stress-test retrieve-issue facebook/react <object id>
As above, if you know the object ID you can get additional information on the change graph of an issue.
collab-stress-test issue-change-graph-info facebook/react <object ID>
The --just-graphviz
flag for this command can be used to output a graphviz
representation of the change graph to standard output.