Irmin.merge_head: conflict, retrying (223684) (100% CPU)
talex5 opened this issue · 10 comments
After migrating to a new server and doing a git clone --mirror
, I ran DataKit with the GitHub bridge as usual.
DataKit then got into a loop, using 100% of one CPU, trying to commit the update from the bridge. Running with debug logging, it looks like this (repeated over and over):
2018-04-06 14:06 irmin-io [DEBUG] lock /data/.git/lock/refs/heads/github-metadata 1
2018-04-06 14:06 irmin-io [DEBUG] Reading /data/.git/refs/heads/github-metadata
2018-04-06 14:06 irmin [DEBUG] Irmin.merge_head: conflict, retrying (22932).
2018-04-06 14:06 irmin-io [DEBUG] Reading /data/.git/refs/heads/github-metadata
2018-04-06 14:06 irmin-io [DEBUG] Reading /data/.git/packed-refs
2018-04-06 14:06 git.fs [DEBUG] read d892ed2635d9c883d1cebbae9af236b6fc930294
2018-04-06 14:06 irmin [DEBUG] Head.find -> d892ed2635d9c883d1cebbae9af236b6fc930294
2018-04-06 14:06 git.fs [DEBUG] read d892ed2635d9c883d1cebbae9af236b6fc930294
2018-04-06 14:06 git.fs [DEBUG] read e35b82f5526bfa5f164376a28d32d9326d84aa25
2018-04-06 14:06 irmin.merge [DEBUG] some Some
"e35b82f5526bfa5f164376a28d32d9326d84aa25" |
Some
"d892ed2635d9c883d1cebbae9af236b6fc930294"
2018-04-06 14:06 irmin.merge [DEBUG] default Some
"e35b82f5526bfa5f164376a28d32d9326d84aa25" |
Some
"d892ed2635d9c883d1cebbae9af236b6fc930294"
2018-04-06 14:06 irmin.merge [DEBUG] default old=Some
"e35b82f5526bfa5f164376a28d32d9326d84aa25"
2018-04-06 14:06 git.fs [DEBUG] read e35b82f5526bfa5f164376a28d32d9326d84aa25
2018-04-06 14:06 irmin.git [DEBUG] test_and_set
d892ed2635d9c883d1cebbae9af236b6fc930294 was the previous head of the branch. e35b82f5526bfa5f164376a28d32d9326d84aa25 is the new commit, which has d892ed2635d9c883d1cebbae9af236b6fc930294 as its only parent.
(the default old=
value looks a bit suspicious to me... shouldn't that be d89...?)
Looks like Irmin is:
- reading the current head
- doing a
test_and_set
with that as the test value - that is failing for some reason
- Irmin assumes the head changed and retries
This happens with the latest
image on hub, and also with the last stable release (datakit/db:0.11.0
).
To try to unblock it, I did:
git branch -f github-metadata e35b82f5526bfa5f164376a28d32d9326d84aa25
Now keeps doing:
2018-04-06 14:43 irmin-io [DEBUG] lock /data/.git/lock/refs/heads/github-metadata 1
2018-04-06 14:43 irmin-io [DEBUG] Reading /data/.git/refs/heads/github-metadata
2018-04-06 14:43 irmin [DEBUG] Irmin.merge_head: conflict, retrying (223683).
2018-04-06 14:43 irmin-io [DEBUG] Reading /data/.git/refs/heads/github-metadata
2018-04-06 14:43 git.fs [DEBUG] read e35b82f5526bfa5f164376a28d32d9326d84aa25
2018-04-06 14:43 irmin [DEBUG] Head.find -> e35b82f5526bfa5f164376a28d32d9326d84aa25
2018-04-06 14:43 git.fs [DEBUG] read e35b82f5526bfa5f164376a28d32d9326d84aa25
2018-04-06 14:43 irmin.git [DEBUG] test_and_set
2018-04-06 14:43 irmin-io [DEBUG] lock /data/.git/lock/refs/heads/github-metadata 1
2018-04-06 14:43 irmin-io [DEBUG] Reading /data/.git/refs/heads/github-metadata
2018-04-06 14:43 irmin [DEBUG] Irmin.merge_head: conflict, retrying (223684).
2018-04-06 14:43 irmin-io [DEBUG] Reading /data/.git/refs/heads/github-metadata
2018-04-06 14:43 git.fs [DEBUG] read e35b82f5526bfa5f164376a28d32d9326d84aa25
2018-04-06 14:43 irmin [DEBUG] Head.find -> e35b82f5526bfa5f164376a28d32d9326d84aa25
2018-04-06 14:43 git.fs [DEBUG] read e35b82f5526bfa5f164376a28d32d9326d84aa25
2018-04-06 14:43 irmin.git [DEBUG] test_and_set
@samoht : have you seen anything like this before?
Which version of datakit are you using?
I have fixed something similar in 8e9d41b where ocaml-git
and git
didn't agree on how to serialise hashes in the reference files (e.g. one was adding a \n
while the other was not). Can you try to edit .git/refs/heads/github-metadata
manually and trying to add/remove a newline at the end of the file and see if things are better?
Ah, could be. I've just rebuilt using the master version of DataKit and that seems to have solved it. I should probably push some new binaries. I guess I must have used a newer version of git than before to do the clone.
Spoke too soon. Still hanging :-( It's using Irmin 1.3.2 and ocaml-git 1.11.3.
That's very strange :-/ Can you try to add some debug messages there: https://github.com/moby/datakit/blob/master/src/datakit-io/datakit_io.ml#L415 ?
With some debug:
let test_and_set_file ?temp_dir ~lock file ~test ~set =
Log.info (fun l -> l "test_and_set_file %s" file);
Lock.with_lock (Some lock) (fun () ->
read_file file >>= fun v ->
let equal = match test, v with
| None , None -> true
| Some x, Some y -> Cstruct.equal (trim x) (trim y)
| None, Some y -> Log.info (fun l -> l "test_and_set_file: None, Some %a" Cstruct.hexdump_pp y); false
| Some x, None -> Log.info (fun l -> l "test_and_set_file: Some %a, None" Cstruct.hexdump_pp x); false
in
if not equal then (
Log.info (fun l -> l "test_and_set_file: false");
Lwt.return false
) else
(match set with
| None -> remove_file file
| Some v -> write_file ?temp_dir file v)
>|= fun () ->
true
)
2018-04-06 18:48 irmin [DEBUG] Irmin.merge_head: conflict, retrying (67).
2018-04-06 18:48 irmin [DEBUG] Head.find -> ccb2dd9f9d2ba39894783a9130b71d98aaa97ccf
2018-04-06 18:48 irmin.commit [DEBUG] 3-way merge between 4d17 and ccb2
2018-04-06 18:48 irmin.commit [DEBUG] fast-forward
2018-04-06 18:48 irmin.commit [DEBUG] lcas 70: depth=0 time=0.0000s
2018-04-06 18:48 irmin.git [DEBUG] test_and_set Some
ccb2dd9f9d2ba39894783a9130b71d98aaa97ccf
Some
4d17e65a6e4c45eb1de9cd0d5ac96ec2677f0285
2018-04-06 18:48 irmin-io [INFO] test_and_set_file /data/.git/refs/heads/commit-index
2018-04-06 18:48 irmin-io [INFO] test_and_set_file: Some 63 63 62 32 64 64
39 66 39 64 32
62 61 33 39 38
39 34 37 38 33 61
39 31 33 30 62
37 31 64 39 38
61 61 61 39 37 63
63 66 , None
2018-04-06 18:48 irmin-io [INFO] test_and_set_file: false
2018-04-06 18:48 irmin [DEBUG] Irmin.merge_head: conflict, retrying (68).
Indeed, /data/.git/refs/heads/commit-index
doesn't exist, but:
grep commit-index packed-refs
ccb2dd9f9d2ba39894783a9130b71d98aaa97ccf refs/heads/commit-index
The write
functions assume that the references have all been unpacked (which should be the case as read
always unpacks in ocaml-git
). I am not sure what is happening. Maybe the reader of packed references is buggy (which I doubt because that format is pretty simple). Can you try to unpack that reference manually and see if that works better?
Where does ocaml-git do that unpacking on read? https://github.com/mirage/ocaml-git/blob/1.11/src/git/fs.ml#L578 doesn't seem to write anything, and doesn't indicate to its caller whether it got it from a loose or packed ref. And it would be strange if it did, since then you wouldn't be able to access a read-only repository.
BTW, I've also managed to reproduce the bug using just the irmin cli: mirage/irmin#489
You are right, the packed references are actually never unpacked; but ocaml-git will first read unpacked references before looking at the packed-ref files, so unpacked references will only comes first (so a reference can be at the same time packed and unpacked with different values ...).
So yes, there's definitely an issue when doing test_and_set
on a packed references astest
will try to read a non-existing file. Not sure why we didn't catch this before.
Thanks for investigating and I will try to come up with a fix (which will probably be a temporary workaround, given that ocaml-git 2.0 is on its way).
That would be great - thanks! I also don't know how this worked before...
Should be fixed by using the freshly released git.1.11.5