data-apis/dataframe-api

Roadmap for Dataframe API

MarcoGorelli opened this issue · 1 comments

Let's try to get a roadmap together. I'm a little worried that at the current pace it'll take another year until we can publish a non-beta spec. That's too long. So let's zoom out and think about what we'd like to achieve, and if this can help re-prioritise.

Here's some milestones I'd like to aim for:

  • by the end of the year: merge (or achieve some other resolution) on the following topics:
    • ✅ have a Scalar class
    • ✅ cross-dataframe column comparisons
  • by February 2024
    • tag the first non-beta version
  • by April 2024, make sure the spec and dataframe-api-compat are complete enough that it's possible to rewrite the majority of some dataframe-consuming library using the standard
  • by November 2024, have production-ready implementations of the standard for all libraries involved

If we want to achieve the above, then we need to turn things around. In particular, this may mean not getting lost in details - in particular, I suggest punting on:

  • propagation of persistedness (including whether this should be done at all)
  • whether Scalar.__bool__ is allowed to raise

and leaving these implementation-specific for now. If we do manage to turn things around and make good progress on the roadmap above, we could (and probably should!) bring these up again