tobgu/qframe

Improve Eval

Opened this issue · 2 comments

It would be nice to implement some stronger typing around the Eval system in QFrame. There are currently many empty interfaces passed around which can make it challenging for end users to understand how to use the library.

I've included an example of some sample code below to illustrate this.

package main

import (
	"crypto/md5"
	"fmt"

	"github.com/tobgu/qframe"
	"github.com/tobgu/qframe/config/eval"
	"github.com/tobgu/qframe/types"
)

// Create a new column containing an MD5 hash
// of the data in each preceding column.
func main() {
	qf := qframe.New(map[string]interface{}{
		"COL1": []string{"2001", "2002", "2003"},
		"COL2": []int{1, 2, 3},
		"COL3": []float64{3, 4, 5},
	})
	ctx := eval.NewDefaultCtx()
	// Add a new "hash" function to the default EvalContext.
	ctx.SetFunc("hash", func(data *string) *string {
		hash := fmt.Sprintf("%X", md5.Sum([]byte(*data)))
		return &hash
	})
	// Things get a bit awkward here..
	// Range over each column and create
	// an expression that coercises the
	// underlying datatypes to a string.
	var toStrings []interface{}
	for _, name := range qf.ColumnNames() {
		// We've lost all typing here and it is unclear what "str"
		// does. To find out we need to lookup the logic in config/eval/context
		// and follow those definitions into the function package. Maybe we
		// could pass the functions in directly?
		toStrings = append(toStrings, qframe.Expr("str", types.ColumnName(name)))
	}
	// Concatentate each string together and then
	// pass the result to our custom "hash" function above.
	// Again we've lost strong typing and clarity here,
	// since we just defined "hash", that one is obvious.
	// The unary operators are convenient but a type
	// that I can jump to in my editor would be nicer.
	expr := qframe.Expr("hash", qframe.Expr("+", toStrings...))
	// md5sum = hash(concat(string(T)...))
	qf = qf.Eval("md5sum", expr, eval.EvalContext(ctx))
	fmt.Println(qf)
}

/*
COL1(s) COL2(i) COL3(f) md5sum(s)
------- ------- ------- ---------
   2001       1       3 0E5DDD...
   2002       2       4 E185FD...
   2003       3       5 A3CE02...

Dims = 4 x 3
*/

Zero urgency behind this issue, the code above works just fine!

Thanks for all your work on this awesome library.

tobgu commented

I assume that the above example is just there to illustrate your point and that you are aware of the Distinct method?

That said I'm pro adding stricter type definitions where possible. I need to take some time for thinking and experimenting with this particular case though. If you have ideas please let me know!

You are indeed correct about Distinct, I updated the example code above to just include the hash function to illustrate the typing issues. Will try to come up with some improvements as time permits, thanks!