Support the brand new Pandas Dataframe alternatives
juarezr opened this issue · 0 comments
Problem description
It would be nice to support the brand new Dataframe besides Pandas.
Two interesting candidates would be:
Modin Overview
Scale your pandas workflow by changing a single line of code
Modin uses Ray or Dask to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical.
Polars Overview
Lightning-fast DataFrame library for Rust and Python
Polars is a lightning fast DataFrame library/in-memory query engine. Its embarrassingly parallel execution, cache efficient algorithms and expressive API makes it perfect for efficient data wrangling, data pipelines, snappy APIs and so much more.
Problem Description
Currently petl
supports Pandas
by using the functions petl.io.pandas.dataframe and petl.io.pandas.todataframe
Evolving this kind of feature would be important to research:
- How do they fit in
petl
use cases. - What are the best ergonomic APIs that we need to consider either for adding new functions or adding support to existing ones.
- What additional burden is needed for supporting it properly. Ex:
- CI: acceptance tests
- CD: impact on the releases
- documentation: details on API, caveats, proper setup, FAQ, and troubleshooting
- What happens when the upstream projects break compatibilities between versions