terhechte/SourceKittenDaemon

Performance Issue?

Closed this issue · 10 comments

I'm unsure if this is a real regression or not:

$ cat /Users/sean/Desktop/demo/demo/main.swift
import AVFoundation

let r = CGR

This yields almost 400k lines of output:

$ curl http://localhost:8888/complete --header "X-Offset: 28" --header "X-Path: /Users/sean/Desktop/demo/demo/main.swift" | wc -l
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 15.8M  100 15.8M    0     0  5904k      0  0:00:02  0:00:02 --:--:-- 5909k
396632

This results in each text editor needing to somehow stream / cache the result. Perhaps SourceKittenDaemon could implement a pager (where the last element gives the offset and page number)? e.g. something like,

$ curl http://localhost:8888/complete --header "X-Offset: 28" --header "X-Path: /Users/sean/Desktop/demo/demo/main.swift" --header "X-Page: 5"

would just return the fifth page of the result (and maybe another argument for page length).

This is a known issue, see nathankot/company-sourcekit#16

Still not 100% sure why this is happening but presumably SourceKit is sending this stuff to XCode to be used in a cache layer in front of SourceKit.

Pagination is definitely a valid solution, probably better than the streaming parser that I suggested in the issue above since it won't have to be done per-implementation.

I guess the only problem is that when SourceKit does the 400k+ lines of output thing, most of it is useless and out-of-context. The first page of results will likely be of no help at all.

It's annoying in cases where SourceKit should be able to return a valid set of completions, yet returns the 400k+ lines instead. I think that is really the crux of the problem - we shouldn't be getting these results from SourceKit at all, especially in contexts where a 'valid' completion set is possible.

One opportunity would be to simply use a custom SourceKit as the sources come with the Swift open source release. However, compiling this as part of SourceKittenDaemon would be a huge pain as SourceKit links against a private Swift Source C++ framework (libIDE). Since libIDE doesn't come with the Swift releases, we'd need to have the full Swift source code as a dependency.

In that case, why would we even use SourceKit. Then, we could theoretically directly use libIDE. However, since Swift doesn't allow us to link against C++, we'd need to add Objective-C bridging.

Another, much simpler, solution would be to fork the Swift source code and add a simple HTTP protocol on top of SourceKit, so that we could talk directly to it.

Everything would be a lot easier if libIDE were part of the official binary Swift releases. Then, at least, with an Objective-C bridging layer, we could call right into it.

Hmmm, I don't see how adding HTTP support to sourcekitd would solve the 400k+ issue (haha I'm just gonna call it that from now on,) assuming it is desired behaviour for Xcode.

I feel like this is just a configuration issue where we are passing in the wrong flags or are missing some settings that disables whatever the 400k+ response feature is. Need to have a consistent test that demonstrates this issue but I haven't been able to isolate anything that reproduces a 400k+ response every time.

If we can't configure sourcekitd to resolve this issue perhaps we can submit a patch to https://github.com/apple/swift that adds a new flag?

This is what I've been looking at, just for reference: https://github.com/apple/swift/tree/master/tools/SourceKit/tools/sourcekitd

Edit: actually @seanfarley can you put the demo project you used to reproduce this on Github? I can use that as a test fixture.

Sure, I just uploaded my demo here:

https://github.com/seanfarley/swift-demo

For me, this generates the 400k+ output each time with:

curl http://localhost:8888/complete --header "X-Offset: 28" --header "X-Path: /Users/sean/Desktop/demo/demo/main.swift" | wc -l

Thanks! I'll take a good look at this over the weekend.

Hey @seanfarley just wanted to double check that your intention with X-Offset: 28 is the same as what I'm seeing:

With X-Offset: 28 the cursor would be here:

import AVFoundation

let r =| CGR
//     ^ here

And that produces the 400k problem every time, however in that scenario it sort of makes sense that SourceKit is returning every possible constructor.

With X-Offset: 32 the cursor is here:

import AVFoundation

let r = CGR|
//         ^ here

This is the position that I had initially imagined when you said it was returning 400k+. I'm getting an empty completion result ([]) for it in my tests.


So yeah, I think we were thinking of two separate issues here:

1. Correct return of 400k+

In the first scenario SourceKit is probably doing the right thing, so either

  • sourcekittend needs to be able to convey these results via paging
  • or the editor implementations need to process the results in a streaming manner
  • or we just ignore these types of results before they get sent to the editor, preventing wasteful deserialization

2. Incorrect return of 400k+

Not sure if you guys have been seeing this but occasionally I get a 400k+ response in situations where I would expect a small set of completion options. This was the issue that I was describing previously that feels like a bug to me, but its not very concrete so we can put this aside for now.


I'm leaning towards an editor-based solution for cases where 400k+ is valid. Perhaps we can filter out these responses on sourcekittend by default and editors that support streaming can pass a flag like X-Allow-400k-Responses?

Here's the branch that has a test using Sean's fixture: https://github.com/nathankot/SourceKittenDaemon/tree/400k-problem

@seanfarley @terhechte comments please :)

Hm, I like the idea of being able to enable the 400k+ while keeping it disabled per default. Paging might be something nice to have in general. Bigger projects, where the list of results tends to be quite long, might benefit from this. However, paging is also more difficult to implement as it requires us to keep state in SourceKittenDaemon, and then we have to think about cache invalidation, and so on. The alternate solution would be to do another SourceKit request for each page, but that feels like a slow and cumbersome solution.

So, yeah, I'd vote for disabling 400k, and allowing them via a header flag. I actually don't think that this is something Xcode uses for caching, but rather a bug somewhere in SourceKit which Xcode just handles somehow. It feels like, if there were a need for a specific caching instruction, there would be a proper API for that.

I can confirm that you're right about the offset. It would seem to be a bug in the emacs interface :-(

@seanfarley The emacs offset starts at 1 as opposed to 0, (min-point) returns 1 :)
@terhechte let's do that! off the top of my head we'll need to:

  • Add header flag
  • Figure how to determine whether a response is 400k+
  • Optimize for streaming the response in the case where 400k+ is desired. Right now it's only really 'streaming' the extremely big json buffer, not sure how many gains can be made by flushing each completion item as it becomes serialized (as opposed to serializing the entire sourcekit response and then sending all of it to Taylor.)

@nathankot We can close this, can we?