manfred-kaiser/business-rule-engine

How to bypass missing parameters?

ferulisses opened this issue · 11 comments

Hi,

I have a situation where I get messages to process with incomplete arguments, for example:

t1 = { "a": "foo", "b": 10 }

t2 = { "a": "bar", "c": 20 }

And then I created the rules:

r = """
rule "print foo"
when
   AND(a = "foo",
   b = 10)
then
   print("foo")
end

rule "print bar",
when
   AND(a = "bar",
   c = 20)
then
    print("bar")
end
"""

And run with:

from business_rule_engine import RuleParser

p = RuleParser()
p.register_function(print)
p.parsestr(r)
p.execute(t1)
p.execute(t1, False)
p.execute(t2)

When I process the rules with t1 it will pass
When I use t1 with stop_on_first_trigger = False it will fail with ValueError exception
When I use t2 it will fail too with a ValueError exception indicating the missing arguments.

I could fill the missing arguments with None just to bypass the conditions, but I don't think that this is right, also, in the real world it's much more complex than that.

I was thinking to implement a parameter to bypass the ValueError exception in _get_params function, that should consider false all the conditions that have missing arguments and don't raise the exception, this should allow t2 to run.

Maybe adding to function execute the param ignore_missing_arguments, default False will raise the exception, when changed to True, it will not consider conditions where the param don't match all the arguments.

Is this the better solution for this situation? What do you think about?

Are the datasets from the same datasource and only some values are missing or do you use different datasources?

If you have to process incomplete datasets, you have to define, how you would handle unknown values.
Your suggestion to use None as a placeholder for missing values seems the best approach for me. This indicates, that the data can be processed with this ruleset, but the value is missing/unknown.

If you have different datasets, which shares some values, processing those with the same ruleset is not recommended. In such cases you should create different rulesets or normalize your data and handle it as an incomplete dataset.

Ignoring rules, when an argument is missing seems not the right approach, because it only hides errors in you datasets.

Do you have an example data set or can you describe it? Why do you need different arguments for a ruleset?

I'm processing some whois data and applying rules to determine if the domain is about to expire, if the owner is anonymous or if there is any sensible data available, DNS Sec records and some other validations.

But the data is different depending on the registrar, .biz domains returns different fields than .com, than .io, than com.br, but there are some fields that are common.

I'm testing some python libraries that normalize most of the data, but sometimes there is a field missing in some country (normally because it's translated).

I pretend to correct this information when I get a sample, but will be useful to process the remaining fields that exists.

In example:

  • google.com returns the field "Registrar Registration Expiration Date" with a timestamp
  • google.io returns the field "Registry Expiry Date" with a timestamp
  • google.com.br returns the field "expires" with date formatted as AAAAMMDD

I was thinking in use rules because it will be ease to write new rules when I discover a new TLD field than write code to process the field.

In your example, it's best to map those values to the same name. This is not possible with the rule engine.

You have to create a script to normalize fields like "Registrar Registration Expiration Date" and "Registry Expiry Date".

Perhaps it's easier to create your script only with python, except you need to provide an interface for other people or need to change the rules, while the program is running.

There are hundred of TLDs and each one have it particularities, there are some python libraries to process whois data that already normalizing some data but still there are some variants depending on the domain registrar.

I created an interface so a no programmer will create the rules here.

I'm proposing this change because when a single condition don't get all the arguments, none of the other conditions are processed.

I made an implementation that is working for me:
https://github.com/ferulisses/business-rule-engine/pull/3/files

If you think that is appropriate I will create a pull request, If you think that it's better to use a different name for the option, just say it.

Your solution ignores rules with missing arguments. I think it is better to provide default values.

This should work with your use case, because missing arguments will be set to None. Setting arguments to None also will reflect, that some values are missing and you can handle those values with your rules.

If you want to test it, you can checkout the development branch:
https://github.com/manfred-kaiser/business-rule-engine/tree/develop

Funny, I implemented something like this at first, the main difference was that I was setting the default do None, without option to define the default_arg.

I was just afraid to match some condition that compare anything with None when it shouldn't.

Thinking more about this solution now, it's very interesting with default_arg, I can setup a special string as default_arg and check for missing arguments without breaking anything, it's more than what I asked :-)

I have created a new release 0.0.5 and released it on pypi.org

You're welcome! :-)

Thank you!

I have made some changes to the rule parser. Now you can iterate the ruleparser in your script and execute each script on it's own.

This gives you more control, which rules should be executed and you can handle rules with unknown arguments in your scirpt.

Now you can also catch business_rule_engine.exceptions.MissingArgumentError in your code.

from business_rule_engine import RuleParser
from business_rule_engine.exceptions import MissingArgumentError

def order_more(items_to_order):
    return "you ordered {} new items".format(items_to_order)

rules = """
rule "order new items"
when
    products_in_stock < 20
then
    order_more(50)
end
"""

params = {
    'products_in_stock': 10
}

parser = RuleParser()
parser.register_function(order_more)
parser.parsestr(rules)
for rule in parser:
    try:
        rvalue_condition, rvalue_action = rule.execute(params)
        if rule.status:
            print(rvalue_action)
            break
    except MissingArgumentError:
        pass

This script will print you ordered 50 new items.