Special symbols like © seem to mess with sourcekit results
nathankot opened this issue · 9 comments
As per what @galeo discovered in nathankot/company-sourcekit#16, I'll post my findings here:
Completion without the copyright symbol, offset is at: CGRect(|)
:
$ sourcekitten complete --text '# ; import AVFoundation; CGRect()' --offset 32 | head
[{
"sourcetext" : "origin: <#T##CGPoint#>, size: <#T##CGSize#>"
}, ... ]
Completion with the copyright symbol, offset is at CGRect(|)
:
$ sourcekitten complete --text '# ©; import AVFoundation; CGRect()' --offset 33 | head
[{
"sourcetext" : "()",
}, ... ]
Completion with the copyright symbol, offset is (seemingly) incorrect at CGRect()|
:
$ sourcekitten complete --text '# ©; import AVFoundation; CGRect()' --offset 34 | head
[{
"sourcetext" : "origin: <#T##CGPoint#>, size: <#T##CGSize#>",
}, ... ]
It looks like xcode isn't considering the ©
character at all.
I'm not sure if this is desired behavior on Soucekit's part, but it'd be interesting to get your input guys @terhechte @seanfarley
I wonder if this also applies to other characters or if this is a special case with only the © symbol.
Most likely applies to others as well
On Tue, May 10, 2016 at 7:57 PM, Benedikt Terhechte <
notifications@github.com> wrote:
I wonder if this also applies to other characters or if this is a special
case with only the © symbol.—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#42 (comment)
Does it only happen if it is in the same line? Or also if it is somewhere before the current cursor? To remedy this, we'd probably need to open the file, jump to the correct offset, and go back to make sure that none of those special characters are in there, right?
I imagine this is due to unicode since some unicode characters count as more than one. That's a bit hand wavy, I realize, but if sourcekit is expecting a byte string (warning: this is just a guess), then the counting will be off with unicode. You can see this in python2:
$ python2.7 -c 'print len("😈")
4
In the case of "©", we can see why adding 1
seemingly works:
$ python2.7 -c 'print len("©")'
2
Nice :) I propose we fix this in either sourcekittendaemon or sourcekitten:
7> "©".utf8.count
$R2: Distance = 2
8> "©".characters.count
$R3: Distance = 1
Actually, now that I think about it this really has to be fixed in the editor integrations doesn't it, otherwise the top layers would be needing to do magic translating a character offset to a utf8 offset.
In emacs:
(position-bytes (point))
This has been fixed in company-sourcekit :) I'll add a note to the readme for sourcekittendaemon and close this.
👍