Polars is an alternative to Pandas that I've heard about but never actually used. According to itself, it is a "blazingly fast DataFrames" - can you believe that?
In this article, I tested it in my own common environment, and it's really fast.
The bar chart shows that Polars takes 1/4 or even less time than Pandas for common operations:
- Apple Silicon M1 (2020, the cheapest one)
- MacOS 13
- Jupyter Notebook in VSCode
- Python ==3.10.9
- pandas==1.5.3
- polars==0.16.6
-
Import a 10MB csv file with spe & encoding, which is a very common task
df_pd = pd.read_csv('./data/hotel_train.txt', sep=',', encoding='utf-8') df_pl = pl.read_csv('./data/hotel_train.txt', sep=',', encoding='utf-8')
-
Concatenate repeated dfs into one
pd.concat([df_pd, df_pd, df_pd]) pl.concat([df_pl, df_pl, df_pl])
-
Simple statistical operations of groupby and sum
df_pd.groupby('hotel').sum() df_pl.groupby('hotel').sum()
-
Loop statistical operations of groupby and sum according to each column name
for col in df_pd.columns: df_pd.groupby(col).sum() for col in df_pl.columns: df_pl.groupby(col).sum()
Detailed test data and code: https://github.com/reycn/polars-pandas-bench
- Test dataset: https://www.kaggle.com/datasets/jessemostipak/hotel-booking-demand
- Test data and code: https://github.com/reycn/polars-pandas-bench
- Someone else's large-scale test: https://h2oai.github.io/db-benchmark/
- Polars open source repository: https://github.com/pola-rs/polars