greenelab/lab-website-template

Change list component `filters` to Ruby expression?

Closed this issue · 4 comments

See #271 for motivation behind this. Sometimes we need more complex logic for filtering, and trying to make our own syntax for this will be brittle and an anti-pattern.

I think it might be best to simply have filter take any Ruby expression that evaluates to true/false. The, under-the-hood, we can use eval and define all fields on an item as local variables in the evaluation.

So, the user could provide something like filter="publisher == 'bioRxiv' and date =~ /^2020/" (all papers published by bioRxiv in 2020) or !alumni ? true : date =~ /^2024/ (keep all papers by current lab members, but only show 2024 papers for alumni team members).

This is way more flexible, and perhaps more intuitive and readable. For example, compare role == 'programmer' and alumni == true (proposed syntax) to role: programmer, alumni: true (current syntax). Ruby syntax in general, imo, is not very intuitive, but for simple things it's fine. A user would probably have to look up that they need =~ for regex unless they happen to know Ruby.

Also, I'd want to rename this param filter (since it's only one condition and to force users to notice that the behavior has changed if they don't update their syntax after updating their template version) and deprecate filters (I don't want to have two params that look almost the same with two very different behaviors).

Users, please indicate whether you would like or dislike this change with a 👍 or 👎 .

Here is some under-the-hood code I've been experimenting with:

# test data
data = [
  {"name" => "Jane Smith", "date" => "2015-01-01"},
  {"name" => "John Smith", "date" => "2019-02-01"},
  {"name" => "Ada Lovelace", "date" => "2020-03-01", "alumni"=> true},
  {"name" => "Alan Turing", "date" => "2024-04-01", "alumni"=> true},
  {"name" => "Margaret Hamilton", "date" => "2018-05-01", "alumni"=> true},
]

# test filter
filter = "alumni ? !name.start_with?('A') : true"

###########

def empty_binding
  binding
end

# make arbitrary string into valid ruby variable name
def safe_var_name(name)
  return name.to_s.gsub(/[^a-z]+/i, "_").gsub(/^_|_$/, "")
end

# filter a list of hashes
def data_filter(data, filter)
  if not filter.is_a?(String)
    return data
  end

  # filter data
  return data.clone.select{
    |item|
    # start with empty context of local variables
    b = empty_binding
    # add item as local variable
    b.local_variable_set("item", item)
    # also set each item field as local variable when evaluating filter
    item.each do |var, val|
      b.local_variable_set(safe_var_name(var), val)
    end
    # whether to keep item
    keep = true
    while true
      begin
        # evaluate expression as true/false
        keep = !!eval(filter, b)
        break
        # if a var in expression isn't a field on item
      rescue NameError => e
        # define it and re-evaluate
        b.local_variable_set(safe_var_name(e.name), nil)
      end
    end
    # keep/discard item
    keep
  }
end

filtered = data_filter(data, filter)

filtered.each do |item|
  puts item
end

https://onecompiler.com/ruby

I believe now that this would be a good change, especially with the defining of item fields as local variables so the user can just type field instead of item['field'] (adds up).

I believe this implementation covers all the bases and is ready for a PR. This treats any var in the filter that doesn't exist in the item as "nil" to prevent an access/name error. It also must convert field names to safe variable names, i.e. some-field (valid key in YAML) becomes some_field (valid variable name in Ruby). If someone doesn't like that, they can still use item["123 some invalid var name"] in their filter string.

This looks very flexible and I think it doesn't complicate things from the user point of view. So, it looks good!

Would it be possible to use variables to compose the years inside the pattern to be matched? So if we had

{"author" => "Ada Lovelace", "date-joined" => "2020-03-01", "date-left" => "2022-10-01", "alumni"=> true}

we filter as

filter = "!alumni ? true : (date-joined <= date-publication and date-publication <= date-left)"

instead of hard writing the date interval?

I asking this just because if someone tries to scale this solution for two or more members who left the lab, they would need to hard write the years of each of them in the filter pattern and that would get very complicated.

But for sure, for one single member, this solution works fine. And also, it is more general than that, so it can be used to filter things in other scenarios, so I like it anyway.

By the way, thank you very much the effort!

With this solution, you could use any valid Ruby expression. It's literally just evaluating the string you pass to filter as Ruby code, seeing if it evaluates to true/false, and keeps/discards the item accordingly. So you could go as wild as you want.

I'm very inexperienced with Ruby, but I think you could just do this date.between?("2019", "2023"), or date.between?(date_joined, date_left).

Note the last few sentences of this comment above, though. You won't be able to do date-joined < date-publication, because date-joined isn't a valid variable name in Ruby (or most languages). So my demo code above converts it to date_joined automatically. Or you could use item["date-joined"].

I played a little with the date_publication.between?(date_joined, date_left) and it seems to work fine (: