hyperjumptech/grule-rule-engine

Panic on workingMemory when remove a rule

vitbaq opened this issue · 18 comments

Describe the bug
Hey guys, I'm working with the grule engine and I came across the following situation.

I created a microservice that uses grule to analyze data over rules. In this microservice I have an API that allows adding and deleting a rule. But the service is crashing when I remove an existing rule and add a new rule. So, I created a test application only using grule to analyze the problem and I came across a panic:
DEBU[0000] Cloning working memory POCRules:0.0.1 lib=grule-rule-engine package=ast DEBU[0000] Cloning 7 expressionSnapshotMap entries lib=grule-rule-engine package=ast panic: expression BMF.Value1 is not on the clone table

Apparently the problem is when removing the rule just on the ruleEntries but the workingMemory remains the same.

I will index the sample code that reproduce this error.

So my question is,
Am I using rule removal correctly? Or is this a bug in the removal?

To Reproduce
Code: https://gist.github.com/vitbaq/35f52f30bdfb442eb7a7a892ddb1d5d3
Log: https://gist.github.com/vitbaq/88a6f00edcadbb8724218a17551d113f

Testing environment:

  • Operating System/Platform: Ubuntu 18.04.5 LTS (Bionic Beaver)
  • Go version: go1.13.5 linux/amd64
  • Grule-rule-engine version: 1.9.0

@vitbaq

By simply reading your issue, I can immediately confirm this bug.
The working memory are tightly related to the AST to ensure there are no duplicated expression and expression atoms.
Removing a rule means that some expression and expression atoms in the working memory should be restructured.
Its part of the RETE logic built into Grule.

Removing a RuleEntry from KnowledgeBase sounds trivial but its quite complex underneath to do it properly.

A work around is to rebuild knowledge base each time there's a rule entry removal. I need to think about this and carefully
make it work without Breaking the working memory mechanism on cloning KnowledgeBase.

Thank you for pointing this out.

Thanks for the answer. I will do what was recommended. I will also study more of the library code to be able to help with the development and propose a solution for the bug.

Please keep me in touch if you find a solution to this challenge. Thanks.

Hi @newm4n, did you get a chance to look into this ? we have similar situation in my application like we need to remove a rule entry from the existing knowledgeBase because rebuilding knowledgebase with all the rules from scratch is expensive.

Let me know if you have any algorithm or idea in your mind, i can contribute or help you on this

Hello @vitbaq, I have a workaround(may be you can call it as a hack), reset the working memory and index it again while removing the rule entry, I have modified your example for better understanding.

// Remove the added rule (Rule1)
	knowledgeBase.RemoveRuleEntry("Rule1")
	//Workaround starts
	delete(knowledgeLibrary.Library["POCRules:0.0.1"].RuleEntries, "Rule1")
	knowledgeBase = knowledgeLibrary.GetKnowledgeBase("POCRules", "0.0.1")
	knowledgeBase.WorkingMemory = ast.NewWorkingMemory("POCRules", "0.0.1")
	knowledgeBase.WorkingMemory.IndexVariables()
	//Workaround ends

Reason for that panic is even after removing ruleentry the working memory still contains the old data (expressions, variables etc) and this will create a problem while cloning, so i am recreating the working memory again to reflect the new changes. @newm4n, @niallnsec correct me if this leads to any problem

Full Example:

package examples

import (
	"fmt"
	"github.com/stretchr/testify/assert"
	"testing"

	"github.com/hyperjumptech/grule-rule-engine/ast"
	"github.com/hyperjumptech/grule-rule-engine/builder"
	"github.com/hyperjumptech/grule-rule-engine/engine"
	"github.com/hyperjumptech/grule-rule-engine/pkg"
)

type POCfact struct {
	Value1 int64
	Value2 int64
	Value3 int64
	Value4 int64
}

const rule1 = `
		rule Rule1 "Check the value1 value" {
        when 
           BMF.Value1 > 100
		then
        Retract("Rule1");
}`

const ruleToAdd = `
rule Rule2 "Check the value2 value" {
        when 
            BMF.Value2 > 100
		then
        Retract("Rule2");
}
`


func TestRuleAdd(t *testing.T) {
	bmFact := &POCfact{
		Value1: 1350,
		Value2: 140,
	}
	// Create New KnowledgeLibrary
	knowledgeLibrary := ast.NewKnowledgeLibrary()
	ruleBuilder := builder.NewRuleBuilder(knowledgeLibrary)

	dataCtx := ast.NewDataContext()
	err := dataCtx.Add("BMF", bmFact)
	if err != nil {
		panic(err)
	}

	//Added a new rule
	err = ruleBuilder.BuildRuleFromResource("POCRules", "0.0.1", pkg.NewBytesResource([]byte(rule1)))
	// Get knowledgeBase and Engine to run FetchMatchingRules
	knowledgeBase := knowledgeLibrary.NewKnowledgeBaseInstance("POCRules", "0.0.1")
	assert.Equal(t, len(knowledgeBase.RuleEntries), 1)

	// Remove the added rule (Rule1)
	knowledgeBase.RemoveRuleEntry("Rule1")
	//Workaround starts
	delete(knowledgeLibrary.Library["POCRules:0.0.1"].RuleEntries, "Rule1")
	knowledgeBase = knowledgeLibrary.GetKnowledgeBase("POCRules", "0.0.1")
	knowledgeBase.WorkingMemory = ast.NewWorkingMemory("POCRules", "0.0.1")
	knowledgeBase.WorkingMemory.IndexVariables()
	//Workaround ends

	assert.Equal(t, len(knowledgeBase.RuleEntries), 0)



	// Adding a new rule (Rule2)
	err = ruleBuilder.BuildRuleFromResource("POCRules", "0.0.1", pkg.NewBytesResource([]byte(ruleToAdd)))
	knowledgeBase = knowledgeLibrary.NewKnowledgeBaseInstance("POCRules", "0.0.1")
	assert.Equal(t, len(knowledgeBase.RuleEntries), 1)



	engine := engine.NewGruleEngine()
	rulesEntries, err := engine.FetchMatchingRules(dataCtx, knowledgeBase)
	if err != nil {
		panic(err)
	}

	for _, rl := range rulesEntries {
		fmt.Printf("rules entries: %s\n", rl.RuleName)
	}

}


let me know if this doesn't work you

@newm4n can you please check, is this a right way to handle this ? Sorry for asking, its a bit priority thing for me

@jinagamvasubabu, I'm very well aware of this issue.
AFAIK, due to the way RETE is implemented in Grule, where no AST node get duplicated (for speed reason).
Once a Rule get constructed into the Working Memory, its hard to remove any of its expression component from the working memory. At least to my understanding on how it works.

You should read my explanation here https://github.com/hyperjumptech/grule-rule-engine/blob/master/docs/RETE_en.md#what-is-inside-grules-working-memory

"First, it will try its best to make sure that none of the AST (Abstract Syntax Tree) nodes are duplicated."

The working memory stores the linkage of every rule entry's Expression/Expression Atoms the the actual expression stored. ITs to avoid duplication. Removing a rule entry means that we should remove the corresponding Expression/ExpressionAtom from the working memory without causing side effect in case of that Expression/Expression Atom got removed.

Imagine that we have 2 similar rule

rule One {
   when A==1
   then ...
}

rule Two {
   when A== 2
   then ...
}

You see the expression atom of variable "A" used in two rule. One and Two.
The working memory who hold the expression atom "A" holds the actual value.
And variable "A" exist in the two rule.

If we remove the rule One, all expression and expression atom should be removed as long as its not used by other rule.
This is the problem, How can we remove the expression atom "A" and expression "A==1" in rule One, without loosing reference of expression atom "A" and expression "A==2" in rule Two.

Its only a simple expression. Imagine if a rule like

Rule One
   when 
      A.b.c == C.d.e.ToUpper() + (( E.f.DoSomething(G.h + J.k) * L.m.n / 0x232.32)

Rule Two
   when 
      A.b.d == C.d.e.ToUpper() + 23

Rule Three
   when 
      E.f.DoSomething(2 + Gh) < ParseInt(C.d.e)

So many expression and expression atom we need to track. Few of them being used on multiple rule entry.
I don't know if removing the RuleEntry from the RuleEntry array would be just fine without side effect.

ok, thanks for explaining, totally makes sense. what if reconstruct the working memory again after removing the ruleentry.

// Remove the added rule (Rule1)
	knowledgeBase.RemoveRuleEntry("Rule1")
	//Workaround starts
	delete(knowledgeLibrary.Library["POCRules:0.0.1"].RuleEntries, "Rule1")
	knowledgeBase = knowledgeLibrary.GetKnowledgeBase("POCRules", "0.0.1")
	knowledgeBase.WorkingMemory = ast.NewWorkingMemory("POCRules", "0.0.1")
	knowledgeBase.WorkingMemory.IndexVariables()
	//Workaround ends

@newm4n , if the above doesn't work, can you tell me is there any way i can recreate the working memory from scratch by any chance ?

@jinagamvasubabu The more I'm thinking about it. The worst complexity developed on my mind. However. I might have a work around.

Instead of removing the RuleEntry from RuleEntries, We just flag the RuleEntry as "Deleted". Add a member variable Deleted bool in RuleEntry with default of false. As long as this flag is false this RuleEntry will aways get evaluated. If we decide to remove this rule, we rename the RuleEntry into something bogus... like Deleted_j2h3k24h2kj3h4k2j432hk . and change the Deleted flag to True.

This requires minimum amount of change.

  1. Change the GruleEngine.go to always ignore a RuleEntry if the flag Deleted is true.
  2. Modify the RuleEntry.go, Add a member variable Deleted bool, it will automatically be default to false.
  3. Modify the KnowledgeBase.go , change the function RemoveRuleEntry(name) not to remove the RuleEntry from RuleEntries, but instead, change the Deleted flag of the corresponding RuleEntry to True. And rename the rule entries map key to some bogus name.

This could work and sounds like a plan. 🤞

Oh and one more....

  1. Make sure the Clone function in RuleEntry.go to also clone this Deleted flag.

Thanks for this, i will quickly implement this and raise a pr.. fingers crossed

@newm4n , one more thing can i add a check to not to add already deleted rule entries in AddRuleEntry when we are loading from GRB file (binary snapshot)

// AddRuleEntry add ruleentry into this knowledge base.
// return an error if a rule entry with the same name already exist in this knowledge base.
func (e *KnowledgeBase) AddRuleEntry(entry *RuleEntry) error {
	e.lock.Lock()
	defer e.lock.Unlock()
        if entry.Deleted {
		return nil
	}
	if e.ContainsRuleEntry(entry.RuleName) {
		return fmt.Errorf("rule entry %s already exist", entry.RuleName)
	}
	e.RuleEntries[entry.RuleName] = entry
	return nil
}

@newm4n , one more thing can i add a check to not to add already deleted rule entries in AddRuleEntry when we are loading from GRB file (binary snapshot)

// AddRuleEntry add ruleentry into this knowledge base.
// return an error if a rule entry with the same name already exist in this knowledge base.
func (e *KnowledgeBase) AddRuleEntry(entry *RuleEntry) error {
	e.lock.Lock()
	defer e.lock.Unlock()
        if entry.Deleted {
		return nil
	}
	if e.ContainsRuleEntry(entry.RuleName) {
		return fmt.Errorf("rule entry %s already exist", entry.RuleName)
	}
	e.RuleEntries[entry.RuleName] = entry
	return nil
}

No I don't think that would be necessary. This method only called by the ANTLR Listener once done parsing a GRL.

@newm4n I think i am missing something or not,

I have made the changes but looks like i am getting the deleted rule when i try to load again, I have made a draft PR for this change #252. let me know if i am missing anything

I suspect we marked delete in knowledgebase but knowledgeLibrary still pointing to the old rules

I have added a similar method RemoveRuleEntry for KnowledgeLibrary as well, Not sure this is correct. Test is passing.

func (lib *KnowledgeLibrary) RemoveRuleEntry(ruleName, name string, version string) {
	_, ok := lib.Library[fmt.Sprintf("%s:%s", name, version)]
	if ok {
		ruleEntry, ok := lib.Library[fmt.Sprintf("%s:%s", name, version)].RuleEntries[ruleName]
		if ok {
			lib.Library[fmt.Sprintf("%s:%s", name, version)].RuleEntries[ruleName].RuleName = fmt.Sprintf("Deleted_%s", ruleEntry.RuleName)
			lib.Library[fmt.Sprintf("%s:%s", name, version)].RuleEntries[ruleName].Deleted = true
			delete(lib.Library[fmt.Sprintf("%s:%s", name, version)].RuleEntries, ruleName)
			lib.Library[fmt.Sprintf("%s:%s", name, version)].RuleEntries[ruleEntry.RuleName] = ruleEntry
		}
	}
}

Ah, yes. The KnowledgeBase holds by the KnowledgeLibrary is the actual one.
KnowledgeBase in the Library acts like a blue print, get copied every time we get a new KnowledgeBase instance from the Library.

Changes has been merged and its in tag v1.10.2. @vitbaq can you try and let us know if its not working