Beyond Pandas: A dive into Python's high-performance dataframe library Polars - Presentation for PythonPune meetup dated 10 Feb 2024
-
Based on the principle of RDBS engines' query optimisation.
-
Written in Rust to provide performance, parallelism and efficient memory management out of the box.
-
Can perform operations over larger than RAM size of data through
Lazy
mode. -
Strict typing
(datatyping) lets the errors to be caught before performing operations on the data, just likeRust
and other low level languages. -
Not just a mere dataframe library, more of a
... Query Engine with a dataframe frontend - Ritchie Vink, EuroPy 2023
- Readability -
.lower()
can mean n things, butto_lower()
means only what it supposed to mean. - No more index anymore 🎉. Seriously, after Pandas, the forceful inclusion of index based operations has became a normal, and I hated every minute of it.
- Expr everywhere. You are not writing Python code anymore for most of the operations, but the python binding of the Rust Expression.
- Optimisation - one less thing to think about when I write the code.
- Community support - Even though the library is comparatively new, the community is very active in discord. You can get answer within a day, for anything, literally anything.
- Steep learning curve - As it is one of a kind of tool/utility/library, you definitely have to learn the
Polar way
, which may not be simple for everyone.
Data - https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
-
Data preparation
- Modification of data types
- Usage of
with_columns
- data selection with
select
andcol
- Copy with
col
andalias
-
Data analysis
groupby
andagg