sjwhitworth/golearn

The same order of categorical values for Equals function is necessary or not

SummerCedrus opened this issue · 1 comments

When i run a LinearRegression for a CategoricalAttribute predict I got some errors.

Here is the error i got:
Predict error Couldn't resolve CategoricalAttribute("SaleCondition", [Normal Abnorml Partial AdjLand Alloca Family])dense.go:275

I print some run log in dense.go:269:
a CategoricalAttribute("SaleCondition", [Normal Partial Abnorml Family Alloca AdjLand]) get CategoricalAttribute("SaleCondition", [Normal Abnorml Partial AdjLand Alloca Family])
Looks like the same order of categorical values is necessary
But the notes of Equals function says

// * If applicable, they have the same categorical values (though not
//   necessarily in the same order).

How can I solve this problem, thank you!

func (Attr *CategoricalAttribute) Equals(other Attribute) bool {
	attribute, ok := other.(*CategoricalAttribute)
	if !ok {
		// Not the same type, so can't be equal
		return false
	}
	if Attr.GetName() != attribute.GetName() {
		return false
	}

	// Check that this CategoricalAttribute has the same
	// values as the other, in the same order
	if len(attribute.values) != len(Attr.values) {
		return false
	}

	for i, a := range Attr.values {
		if a != attribute.values[i] {
			return false
		}
	}

	return true
}

it's necessary for the same order.
A little modification can solve this problem.

func (Attr *CategoricalAttribute) Equals(other Attribute) bool {
	attribute, ok := other.(*CategoricalAttribute)
	if !ok {
		// Not the same type, so can't be equal
		return false
	}
	if Attr.GetName() != attribute.GetName() {
		return false
	}

	if len(attribute.values) != len(Attr.values) {
		return false
	}

	for _, a := range Attr.GetValues() {
		hasSameVal := false
		for _, o := range attribute.GetValues(){
			if a == o {
				hasSameVal = true
				break
			}
		}

		if !hasSameVal{
			return false
		}
	}

	return true
}