talis/tripod-php

Tablespecs should allow row values to be set dynamically based on compiled fields

Closed this issue · 14 comments

When the table row is being generated, it would be useful to be able to apply some logic before it's saved, sort of like:

if fieldName 'x' is greater than fieldName 'y':
    fieldName 'z' should be set to 'x'

if fieldName 'a' exists:
    fieldName 'wibble' should be set to 'fooBar'
else if fieldName 'b' is greater than or equal to 'c':
    fieldName 'wibble' should be set to 'baz'
else
   fieldName 'wibble' should be set to 'coelacanth'

etc.

This way we can have indexes on these values, etc. It would also be helpful to potentially have a flag the fieldNames to not save their value to the table, but keep them in the impact index.

I'm a bit confused by the sytax, it seems to suggest compting fieldnames from other table row fields. Instead do you mean to calculate from the graph values, e.g.:

  if value(x,some:predicate) > value(y,some:predicate)
     set field z to 'some value' 

Also would the values set be fixed literals?

The idea was more of 'post-processing' before saving the row.

So, "based on the data you've collected to create this row, apply the following logic to generate the value of fieldName 'foo', then save the row".

Realistically, isn't every value in a table row a literal?

Would you remove the fields you're working over? i.e. not save them? If not I don't really see the benefit, and if so, surely it is more natural working the condition from the graph data rather than computing fields and then throwing them away?

On the literal, what I meant is "is the derived value a fixed literal derived in the spec, or selected from the working data"?

Soz, closed by accident, hit wrong button :-/

Well, not preserving the fields is probably another ticket, but that is what I'm proposing ultimately, yes. Basically, there's no point in storing the data you don't care about as long the resources influencing the decision making are in the impact index.

I think the derived value should be either sourced from the working data or supplied literals.

Hand-wavy proposal for this to move forward:

{
  "fieldName" : "fooBar",
  "value" : {"_condition_" : {
    "if" : "$foo >= $bar",
    "then" : "$foo",
    "else" : {"_condition_" : { ... }}
    }}
}

_condition_ (kind of) follows the convention introduced by "value" : "_link_". else is optional. Conditions can be nested (as shown above).

To keep this simple, it only works on the pre-stored row/view data (maybe not view, TBD) and in ``$foo, foo references a "fieldName" value.

Optionally, we could also consider a "priority" or "order" property so other fieldNames can be set based on other dynamically sourced fields.

Perhaps s/_condition_/_conditional_/ so as not to confuse things regarding the "filter" property.

Proposed conditional operators:

  • <
  • ==
  • !=
  • =

  • <=
  • =~ (regex match)
  • !~

To make this a lot more versatile and easy to parse, the value of the "if" key should perhaps be an array that requires 1 or 3 values:

"if" : ["a",">=","b"]
...
"if" : [[1,2,3], "contains", 3]
...
"if" : ["$foo"]

etc.

contains and not contains may be more versatile than =~ and !~ since you could use them for regexes and array membership. The only weird thing here is that the condition syntax is then reversed, depending on what is being compared:

"if" : ["$foo", "contains", "/quick brown fox/"]
...
"if" : [[1,2,3], "contains", "$bar"]
...

Some further thoughts on this after spiking out a proof of concept:
"contains" and "=~" serve somewhat different purposes, so I propose keeping both for now.

It would be far easier to add a "computed_fields" property to the top-level tablespec and process these there than at run time within the spec itself: the former ensures that we have all of the values that we need to work with and that we don't run into a issue where a missed join causes the logic to fall to pieces or something.

Also, rather specify the 'temporary' fields (i.e. the fields that we only want to use to determine a computed value, but not save in the table document itself) by a property (e.g. "store": false or something), I propose that we use a naming convention for the properties that are to be thrown away before persisting. I'm going with !propName since that would be a fairly awkward Mongo property name, anyway.

This way we can do an array_filter or array_map or whatever and just remove those key/value pairs before storing it.

There is a proof of concept for this at #40.