georgewfraser/vscode-tree-sitter

[Language Request] Ruby

connorshea opened this issue · 13 comments

A grammar already exists for Ruby, so hopefully it wouldn't be too hard to implement? :) https://github.com/tree-sitter/tree-sitter-ruby

I set it up in https://github.com/georgewfraser/vscode-tree-sitter/tree/ruby

But I'm not a ruby user, I don't have a great sense of what good ruby syntax coloring should be. Would you clone this repo, run npm install then hit F5 to debug, and take a look? The main thing you'll need to modify is the colorRuby function in extension.ts. You can use the tree-sitter playground to see what the tree-sitter syntax tree looks like.

I'm kind of out of my depth here as to actually contributing to the colorization, I'm not really sure what exactly should be considered a function v. field.

One good example of the colorizer not quite working right is with this code:

def upcase_array(array)
  array.map { |item| item.upcase }
end

array = ['hello', 'lowercase', 'strings']

puts upcase_array(array).join(' ')

If you put that in a Ruby file colorized by this extension, it looks like this:

Screen Shot 2019-05-25 at 12 22 59 PM

Notably, puts upcase_array(array).join(' ') acts a bit weird (puts is a more common equivalent to print). upcase_array is a method call and then join is chained onto it. join is correctly highlighted as a 'function' (or 'field'? Like I said, I don't think I fully grok what a field is vs. a function), but upcase_array is not.

This is that line as represented by the tree sitter:

  method_call [6, 0] - [6, 34])
    identifier [6, 0] - [6, 4])
    argument_list [6, 5] - [6, 34])
      method_call [6, 5] - [6, 34])
        call [6, 5] - [6, 29])
          method_call [6, 5] - [6, 24])
            identifier [6, 5] - [6, 17])
            argument_list [6, 17] - [6, 24])
              identifier [6, 18] - [6, 23])
          identifier [6, 25] - [6, 29])
        argument_list [6, 29] - [6, 34])
          string [6, 30] - [6, 33])

identifier [6, 5] - [6, 17]) is theupcase_array part of the line.

I think this part of the colorRuby() function needs to be updated to catch both the upcase_array and join method call, rather than just the first one:

else if (x.type == 'call' && x.lastChild!.type == 'identifier') {
	fields.push(x.lastChild!)
}

If you add another branch to the code, so the function looks like this, the highlighting improves in this case:

function colorRuby(x: Parser.SyntaxNode, editor: VS.TextEditor) {
	var types: Parser.SyntaxNode[] = []
	var fields: Parser.SyntaxNode[] = []
	var functions: Parser.SyntaxNode[] = []
	function scan(x: Parser.SyntaxNode) {
		if (!isVisible(x, editor)) return
		if (x.type == 'method') {
			console.log(x.children)
			fields.push(x.children[1]!)
		} else if (x.type == 'singleton_method') {
			fields.push(x.children[3])
		} else if (x.type == 'instance_variable') {
			fields.push(x)
		} else if (x.type == 'call' && x.lastChild!.type == 'identifier') {
			fields.push(x.lastChild!)
                // Handle additional method calls
		} else if (x.type == 'method_call' && x.firstChild!.type == 'identifier') {
			fields.push(x.firstChild!)
		}
		for (const child of x.children) {
			scan(child)
		}
	}
	scan(x)

	return {types, fields, functions}
}

vscode-ruby highlighting:
Screen Shot 2019-05-25 at 12 39 20 PM

The tree-sitter with my change:
Screen Shot 2019-05-25 at 12 40 06 PM

Sorry this comment is a bit rambly, I figured out how to fix it near the end of the comment and I don't want to rewrite everything :D

As it happens I was working on Ruby colors this morning, how's this?

Screen Shot 2019-05-25 at 12 06 22 PM

I just published a new version, you should be able to download it now https://marketplace.visualstudio.com/items?itemName=georgewfraser.vscode-tree-sitter

👍 Highlighting it with yellow is also better since it makes things look less noisy.

Also FWIW, this code would typically use interpolation:

# Before
print "A: ", a.get, " ", b.get, "\n";
# After
print "A: #{a.get} #{b.get}\n"

And Ruby doesn't really use semicolons except for separating calls in single-line code :)

I'm currently comparing the vscode-tree-sitter coloring to the Ruby coloring in Atom, I'll give some more feedback in a bit.

Here a few notable bits of syntax that aren't currently highlighted properly (Atom on the left, vscode-tree-sitter on the right).

This seems like a tree-sitter problem since tree-sitter just marks it as an indentifier, but private in Ruby can be used to tell a module/class that all subsequent methods are supposed to be private. Can the extension be made to treat it as a keyword?

module Velma
  # This method is private
  def example_public_method
    'test'
  end

  private

  # This method is private
  def example_private_method
    'test'
  end
end

Screen Shot 2019-05-25 at 1 24 43 PM

It doesn't seem like 'constants' – as they're called by the tree-sitter – are treated specially in terms of highlighting? e.g. with classes and modules:

require 'uri'

begin
  URI.open('https://google.com')
rescue URI::InvalidURIError => e
  puts "Error: #{e}"
end

Client.new('test')

Client::Subclient.method('test')

Screen Shot 2019-05-25 at 1 36 29 PM

It doesn't look like, in a lot of situations, symbols have any special handling either. Interestingly, using the 'hashrocket' syntax it does highlight things.

hash = {
  key1: 'value2',
  key2: 'value2'
}

hash2 = {
  :key1 => 'value1',
  :key2 => 'value2'
}

progress_bar = ProgressBar.create(
  total: 'test',
  format: "\e[0;32m%c/%C |%b>%i| %e\e[0m"
)

Screen Shot 2019-05-25 at 1 49 52 PM

Another oddity I noticed is that do/end blocks don't have the same color, when they probably should:

# def and end are the same color
def x_to_string
  x.to_s
end

# do should use the same color as end in this block of code
10.times do |i|
  puts i
end

Screen Shot 2019-05-25 at 1 57 11 PM

Splat and block parameters aren't highlighted in any special way (I'm not really sure if they should be, but I thought it was worth noting).

# Block parameter
def guests(&block)
  puts 'test'
end

# Splat parameter
def guests_array(*array)
  puts 'test'
end

Screen Shot 2019-05-25 at 2 06 55 PM

It also doesn't highlight class or global variables as variables (@@class_var, $global_var), at least not in the same way it does for instance variables (@instance_var).

class Human
  # A class variable. It is shared by all instances of this class.
  @@species = 'Homo sapiens'
end

$global = 'this is a global'

Screen Shot 2019-05-25 at 2 13 34 PM

defined? is another weird little edge-case that isn't highlighted quite right. It's in the list of control keywords, but maybe the question mark confuses it?

@var = "I'm an instance var"
defined? @var #=> "instance-variable"

Screen Shot 2019-05-25 at 2 24 39 PM

I think I fixed all the things you pointed out, please update and see if everything looks right.

Looks pretty good :)

One thing I noticed is that slashes seem to cause problems.

In Ruby, you can use forward slashes for regex, e.g. /^*.+$/, and I'm not sure if the Textmate language file or the tree sitter is being overly-aggressive with the forward slashes.

I've been using the Learn X in Y Minutes doc for Ruby to test the tree-sitter with Ruby code, and it's a pretty good example of the problem.

Everything in this breaks after the 100.methods.include?(:/) #=> true line:

# This is a comment

# In Ruby, (almost) everything is an object.
# This includes numbers...
3.class #=> Integer

# ...and strings...
"Hello".class #=> String

# ...and even methods!
"Hello".method(:class).class #=> Method

# Some basic arithmetic
1 + 1 #=> 2
8 - 1 #=> 7
10 * 2 #=> 20
35 / 5 #=> 7
2 ** 5 #=> 32
5 % 3 #=> 2

# Bitwise operators
3 & 5 #=> 1
3 | 5 #=> 7
3 ^ 5 #=> 6

# Arithmetic is just syntactic sugar
# for calling a method on an object
1.+(3) #=> 4
10.* 5 #=> 50
100.methods.include?(:/) #=> true

# Special values are objects
nil # equivalent to null in other languages
true # truth
false # falsehood

nil.class #=> NilClass
true.class #=> TrueClass
false.class #=> FalseClass

# Equality
1 == 1 #=> true
2 == 1 #=> false

# Inequality
1 != 1 #=> false
2 != 1 #=> true

# Apart from false itself, nil is the only other 'falsey' value

!!nil   #=> false
!!false #=> false
!!0     #=> true
!!""    #=> true

Screen Shot 2019-05-26 at 2 34 37 PM

It can be fixed by adding a forward slash after the existing forward slash, which is why I suspect that's the source of the problem.

Also, FWIW I think symbols should be constant.language.symbol instead of constant.numeric, though I don't know how widely constant.language.symbol is supported by themes?

Right now the symbol itself is colored as a number but the : is colored as a symbol. Both should probably all be colored as a symbol :)

Screen Shot 2019-05-26 at 3 32 38 PM

EDIT: The blue coloring of the : was apparently caused by the VS Code Ruby extension enabling itself without me noticing, my bad. The symbol color issue still stands, however.

The end isn't caught here:

case level.to_sym
when :notice then "is-info"
when :success then "is-success"
when :error then "is-danger"
when :alert then "is-warning"
end

It also doesn't seem to be caught in an if / else / end or if / elsif / end block.

if user.avatar.attached?
  puts 'test'
else
  puts 'test'
end

EDIT: These instances of end not being highlighted are because the theme I'm using doesn't implement keyword.control, so it's not really a problem with this extension. My bad!

Method parameters also aren't highlighted as far as I can tell (I'm not sure why I failed to notice this before). Should be variable.parameter.function, I think?

Left is VS Code Ruby, right is Tree Sitter:
Screen Shot 2019-05-26 at 4 03 06 PM

def meta_description(description)
  return description.presence
end

And there are a decent number of keywords that aren't currently handled in any way: initialize|new|loop|include|extend|prepend|raise|fail|attr_reader|attr_writer|attr_accessor|attr|catch|throw|private|private_class_method|module_function|public|public_class_method|protected|refine|using

Unfortunately it doesn't look like most of these are handled in any special way by the tree-sitter.

VS Code Ruby handles them in its TextMate file here:

https://github.com/rubyide/vscode-ruby/blob/f7621138a60a9e636b8749c26f7e4596f68afc20/syntaxes/ruby.cson.json#L166-L168

Sorry I keep finding more nitpicks >.<

I tried messing around with the textmate file and added the keywords from the VS Code Ruby extension, but unfortunately the method matcher for tree-sitter overrides the TextMate grammar for a lot of them.

For example, protected works fine, but attr_accessor is caught by the tree-sitter method matcher:

Screen Shot 2019-05-26 at 4 29 09 PM

What'd be the best way to fix that?