datavaluepeople/kotsu

Remarks + possible improvements/extentions from v1 PRs

Opened this issue · 2 comments

Compiled feedback, remarks, and the suggested possible improvements and extensions, gained from the v1 PR reviews.

Misc

Remove pandas dep to make pure python package

"worth considering using the stdlib CSV writer here to avoid the v heavyweight pandas dependency"
#13 (comment)

Interface

Always pass extendable run_context object to Validations/Models during run

This will replace the specific artefact_directory functionality, and instead the artefact_directory will be made available (if the user sets it) through the context object.
Agreed to implement this once we come across the next piece of context beyond just artefact_directory. I think I would call it run_context to make it clear it is specific to the running and not specific to configuring Validations or Models.
See #17 (comment) and #17 (review)

Make store functionality be fully implementable by user

Probably by passing a callback function into run, or by returning the results instead of storing them within run
#16 (comment)

Move validation instantiation outside of the loop so each is only instantiated once per run

"Is there an argument for putting the validation_spec.make() outside of the loop over model specs? I would have thought that a single validation instance could be shared by all models. Also building the validation environment could be an expensive step so we'd want to avoid repeating unnecessarily"
#13 (comment)

Stronger typing and validation

#13 (comment)

Docs

In example usage emove validations as factories in favour of just using bare functions

#18 (comment)

Example project structure

I think this is a worthy addition. We could have a few different "example projects" within the docs. Let's do this once we've got some real world example usage.
#18 (review)

More detailed docs on model/validation registration

#18 (comment)

Another thought - it might be useful to think about how to 'package' a benchmark built on kotsu.
i.e. if you wanted to develop some ML/data science benchmark and make the code public, so that others
can run / extend it, and want to use kotsu to build it, is there some standardised project structure that
would work for this use case?

@alex-hh yea that would be super useful. Will mull it over. (Was thinking about it already, thought about packaging up a benchmark as a pypi package, but without having all the deps pinned exactly, some (more) anxiety creeps in that the benchmarks wouldn't be reproducible. Will mull some more. Super open to ideas on it so do share if stuff appears to ya)