JGit is very slow in marking the remote refs as advertised
Opened this issue · 4 comments
Version
5.13
Operating System
MacOS
Bug description
When clone a remote repository that advertised a large number of refs (e.g. in the order of millions) the JGit client spend a lot of time marking the received refs as locally advertised.
See the full stack-trace below:
at java.lang.Throwable.fillInStackTrace(Native Method)
at java.lang.Throwable.fillInStackTrace(Throwable.java:784)
- locked <0x00000007a69a1728> (a java.io.FileNotFoundException)
at java.lang.Throwable.<init>(Throwable.java:266)
at java.lang.Exception.<init>(Exception.java:66)
at java.io.IOException.<init>(IOException.java:58)
at java.io.FileNotFoundException.<init>(FileNotFoundException.java:77)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at org.eclipse.jgit.internal.storage.file.LooseObjects.getObjectLoader(LooseObjects.java:186)
at org.eclipse.jgit.internal.storage.file.LooseObjects.open(LooseObjects.java:149)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openLooseObject(ObjectDirectory.java:396)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openLooseFromSelfOrAlternate(ObjectDirectory.java:373)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObjectWithoutRestoring(ObjectDirectory.java:349)
at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:330)
at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:132)
at org.eclipse.jgit.lib.ObjectReader.open(ObjectReader.java:212)
at org.eclipse.jgit.revwalk.RevWalk.parseAny(RevWalk.java:1075)
at org.eclipse.jgit.transport.BasePackFetchConnection.markAdvertised(BasePackFetchConnection.java:987)
at org.eclipse.jgit.transport.BasePackFetchConnection.markRefsAdvertised(BasePackFetchConnection.java:979)
at org.eclipse.jgit.transport.BasePackFetchConnection.doFetch(BasePackFetchConnection.java:363)
at org.eclipse.jgit.transport.TransportHttp$SmartHttpFetchConnection.doFetch(TransportHttp.java:1550)
at org.eclipse.jgit.transport.BasePackFetchConnection.fetch(BasePackFetchConnection.java:302)
at org.eclipse.jgit.transport.BasePackFetchConnection.fetch(BasePackFetchConnection.java:293)
at org.eclipse.jgit.transport.FetchProcess.fetchObjects(FetchProcess.java:274)
at org.eclipse.jgit.transport.FetchProcess.executeImp(FetchProcess.java:171)
at org.eclipse.jgit.transport.FetchProcess.execute(FetchProcess.java:94)
at org.eclipse.jgit.transport.Transport.fetch(Transport.java:1309)
at org.eclipse.jgit.api.FetchCommand.call(FetchCommand.java:213)
at org.eclipse.jgit.api.CloneCommand.fetch(CloneCommand.java:311)
at org.eclipse.jgit.api.CloneCommand.call(CloneCommand.java:182)
Actual behavior
Clone operation is slow
Expected behavior
Speed improvement in clone operation
Relevant log output
No response
Other information
No response
Did you try if this also happens when using JGit from current master branch ?
Not yet, for now, was just browsing the code and it looks like although the stacktrace doesn't match fully it should behave similarly. Will test it on the master
now.
Yes, it is still the case, here's the stacktrace from master
:
Thread [main] (Suspended (breakpoint at line 212 in LooseObjects))
LooseObjects.getObjectLoader(WindowCursor, File, AnyObjectId) line: 212
LooseObjects.open(WindowCursor, AnyObjectId) line: 171
ObjectDirectory.openLooseObject(WindowCursor, AnyObjectId) line: 418
ObjectDirectory.openLooseFromSelfOrAlternate(WindowCursor, AnyObjectId, Set<Id>) line: 394
ObjectDirectory.openObjectWithoutRestoring(WindowCursor, AnyObjectId) line: 369
ObjectDirectory.openObject(WindowCursor, AnyObjectId) line: 350
WindowCursor.open(AnyObjectId, int) line: 133
WindowCursor(ObjectReader).open(AnyObjectId) line: 216
RevWalk.parseAny(AnyObjectId) line: 1119
TransportHttp$SmartHttpFetchConnection(BasePackFetchConnection).markAdvertised(AnyObjectId) line: 1101
TransportHttp$SmartHttpFetchConnection(BasePackFetchConnection).markRefsAdvertised() line: 1093
TransportHttp$SmartHttpFetchConnection(BasePackFetchConnection).doFetch(ProgressMonitor, Collection<Ref>, Set<ObjectId>, OutputStream) line: 408
TransportHttp$SmartHttpFetchConnection.doFetch(ProgressMonitor, Collection<Ref>, Set<ObjectId>, OutputStream) line: 1565
TransportHttp$SmartHttpFetchConnection(BasePackFetchConnection).fetch(ProgressMonitor, Collection<Ref>, Set<ObjectId>, OutputStream) line: 351
TransportHttp$SmartHttpFetchConnection(BasePackFetchConnection).fetch(ProgressMonitor, Collection<Ref>, Set<ObjectId>) line: 343
FetchProcess.fetchObjects(ProgressMonitor) line: 290
FetchProcess.executeImp(ProgressMonitor, FetchResult, String) line: 182
FetchProcess.execute(ProgressMonitor, FetchResult, String) line: 105
TransportHttp(Transport).fetch(ProgressMonitor, Collection<RefSpec>, String) line: 1482
FetchCommand.call() line: 238
CloneCommand.fetch(Repository, URIish) line: 319
CloneCommand.call() line: 189
Clone.run() line: 131
Clone(TextBuiltin).execute(String[]) line: 239
Main.execute(String[]) line: 247
Main.run(String[]) line: 135
Main.main(String[]) line: 106
For each advertised object we'll do a try-catch and re-throw the FileNotFoundException
as you can tell from the LooseObjects.getObjectLoader()
source code.
Would it be possible to have a different implementation of ObjectDirectory
only for the clone operation? We can safely assume during the clone that we don't have any of the objects. WDYT?
I think using a different ObjectDirectory implementation would be overkill, it looks like this could be fixed by skipping the call to #openLooseFromSelfOrAlternate(WindowCursor, AnyObjectId, Set<AlternateHandle.Id>)
in #openObjectWithoutRestoring(WindowCursor, AnyObjectId)
[1] if we know we are executing a clone. We already have a few other such optimizations for clone.