qri-io/website

Add a "why starlark" page

Closed this issue · 0 comments

b5 commented

We were recently asked a lovely, simple question:

Why did you go with starlark? What are the advantages and trade offs from your perspective?

I think we should get an answer like this, or ideally a shorter one up on a page about Starlark:

We chose starlark because we think it's the right technology to bet on long-term.

We want to live in a world where anyone you're collaborating with can hop into a data catalog, take an existing automated dataset that is close-but-not-quite what they need, modify a few lines of code, test, and re-publish that that automated code back to the catalog to keep it self-updating. We want fly-by users to be able to do that entirely in a web browser, and hard-core data maintainers to be able to run that entire process on their own local machine with fine-grained control and interop with the command line.

Python is taking over the data science space, and we think that's a really good thing. Python syntax is best-in-class as a high-level language and a great fit for data work, but python isn't portable or predictable enough to operate without containerization (docker). We've yet to see anyone build a product on top of docker that works for casual contributions without the high service fees of running docker for folks, so we think we need "portable python".
With starlark we can get the portability of javascript, with the syntax of python, and we think those characteristics combined will make it easier to distribute the work of building & maintaining a data catalog.

There are tradeoffs, specifically: losing out on being able to say "it's just python!", and the rich package ecosystem python brings. We plan to compensate for that by making it really easy to borrow from your past work and the work of colleagues. If for example, you're hitting the same API in numerous different ways, you'll pretty quickly arrive at a "call_soda(" function that's useful across scripts. We anticipate either you or someone around you will need to do the hard work the first time of defining that function, then everyone else can copy-paste, and that copy-paste will just work. You won't need to worry about whether your friend used the same version of python, or if you've installed the right packages. Again, we're betting portability will make this the right tool for the job.

With that said, the end goal here is to produce datasets that can end up in a real-deal, Python-backed jupyter notebook with a single library call. We consider that an integration, and don't think starlark is the be-all-end-all.