eclipse-jgit/jgit

JGit is very slow in marking the remote refs as advertised

Opened this issue · 4 comments

Version

5.13

Operating System

MacOS

Bug description

Bugzilla bug

When clone a remote repository that advertised a large number of refs (e.g. in the order of millions) the JGit client spend a lot of time marking the received refs as locally advertised.

See the full stack-trace below:
	at java.lang.Throwable.fillInStackTrace(Native Method)
	at java.lang.Throwable.fillInStackTrace(Throwable.java:784)
	- locked <0x00000007a69a1728> (a java.io.FileNotFoundException)
	at java.lang.Throwable.<init>(Throwable.java:266)
	at java.lang.Exception.<init>(Exception.java:66)
	at java.io.IOException.<init>(IOException.java:58)
	at java.io.FileNotFoundException.<init>(FileNotFoundException.java:77)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
	at org.eclipse.jgit.internal.storage.file.LooseObjects.getObjectLoader(LooseObjects.java:186)
	at org.eclipse.jgit.internal.storage.file.LooseObjects.open(LooseObjects.java:149)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openLooseObject(ObjectDirectory.java:396)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openLooseFromSelfOrAlternate(ObjectDirectory.java:373)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObjectWithoutRestoring(ObjectDirectory.java:349)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:330)
	at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:132)
	at org.eclipse.jgit.lib.ObjectReader.open(ObjectReader.java:212)
	at org.eclipse.jgit.revwalk.RevWalk.parseAny(RevWalk.java:1075)
	at org.eclipse.jgit.transport.BasePackFetchConnection.markAdvertised(BasePackFetchConnection.java:987)
	at org.eclipse.jgit.transport.BasePackFetchConnection.markRefsAdvertised(BasePackFetchConnection.java:979)
	at org.eclipse.jgit.transport.BasePackFetchConnection.doFetch(BasePackFetchConnection.java:363)
	at org.eclipse.jgit.transport.TransportHttp$SmartHttpFetchConnection.doFetch(TransportHttp.java:1550)
	at org.eclipse.jgit.transport.BasePackFetchConnection.fetch(BasePackFetchConnection.java:302)
	at org.eclipse.jgit.transport.BasePackFetchConnection.fetch(BasePackFetchConnection.java:293)
	at org.eclipse.jgit.transport.FetchProcess.fetchObjects(FetchProcess.java:274)
	at org.eclipse.jgit.transport.FetchProcess.executeImp(FetchProcess.java:171)
	at org.eclipse.jgit.transport.FetchProcess.execute(FetchProcess.java:94)
	at org.eclipse.jgit.transport.Transport.fetch(Transport.java:1309)
	at org.eclipse.jgit.api.FetchCommand.call(FetchCommand.java:213)
	at org.eclipse.jgit.api.CloneCommand.fetch(CloneCommand.java:311)
	at org.eclipse.jgit.api.CloneCommand.call(CloneCommand.java:182)

Actual behavior

Clone operation is slow

Expected behavior

Speed improvement in clone operation

Relevant log output

No response

Other information

No response

Did you try if this also happens when using JGit from current master branch ?

Not yet, for now, was just browsing the code and it looks like although the stacktrace doesn't match fully it should behave similarly. Will test it on the master now.

Yes, it is still the case, here's the stacktrace from master:

Thread [main] (Suspended (breakpoint at line 212 in LooseObjects))	
	LooseObjects.getObjectLoader(WindowCursor, File, AnyObjectId) line: 212	
	LooseObjects.open(WindowCursor, AnyObjectId) line: 171	
	ObjectDirectory.openLooseObject(WindowCursor, AnyObjectId) line: 418	
	ObjectDirectory.openLooseFromSelfOrAlternate(WindowCursor, AnyObjectId, Set<Id>) line: 394	
	ObjectDirectory.openObjectWithoutRestoring(WindowCursor, AnyObjectId) line: 369	
	ObjectDirectory.openObject(WindowCursor, AnyObjectId) line: 350	
	WindowCursor.open(AnyObjectId, int) line: 133	
	WindowCursor(ObjectReader).open(AnyObjectId) line: 216	
	RevWalk.parseAny(AnyObjectId) line: 1119	
	TransportHttp$SmartHttpFetchConnection(BasePackFetchConnection).markAdvertised(AnyObjectId) line: 1101	
	TransportHttp$SmartHttpFetchConnection(BasePackFetchConnection).markRefsAdvertised() line: 1093	
	TransportHttp$SmartHttpFetchConnection(BasePackFetchConnection).doFetch(ProgressMonitor, Collection<Ref>, Set<ObjectId>, OutputStream) line: 408	
	TransportHttp$SmartHttpFetchConnection.doFetch(ProgressMonitor, Collection<Ref>, Set<ObjectId>, OutputStream) line: 1565	
	TransportHttp$SmartHttpFetchConnection(BasePackFetchConnection).fetch(ProgressMonitor, Collection<Ref>, Set<ObjectId>, OutputStream) line: 351	
	TransportHttp$SmartHttpFetchConnection(BasePackFetchConnection).fetch(ProgressMonitor, Collection<Ref>, Set<ObjectId>) line: 343	
	FetchProcess.fetchObjects(ProgressMonitor) line: 290	
	FetchProcess.executeImp(ProgressMonitor, FetchResult, String) line: 182	
	FetchProcess.execute(ProgressMonitor, FetchResult, String) line: 105	
	TransportHttp(Transport).fetch(ProgressMonitor, Collection<RefSpec>, String) line: 1482	
	FetchCommand.call() line: 238	
	CloneCommand.fetch(Repository, URIish) line: 319	
	CloneCommand.call() line: 189	
	Clone.run() line: 131	
	Clone(TextBuiltin).execute(String[]) line: 239	
	Main.execute(String[]) line: 247	
	Main.run(String[]) line: 135	
	Main.main(String[]) line: 106	

For each advertised object we'll do a try-catch and re-throw the FileNotFoundException as you can tell from the LooseObjects.getObjectLoader() source code.

Would it be possible to have a different implementation of ObjectDirectory only for the clone operation? We can safely assume during the clone that we don't have any of the objects. WDYT?

I think using a different ObjectDirectory implementation would be overkill, it looks like this could be fixed by skipping the call to #openLooseFromSelfOrAlternate(WindowCursor, AnyObjectId, Set<AlternateHandle.Id>) in #openObjectWithoutRestoring(WindowCursor, AnyObjectId) [1] if we know we are executing a clone. We already have a few other such optimizations for clone.

[1] https://github.com/eclipse-jgit/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/ObjectDirectory.java#L357