/polars

Rust DataFrames

Primary LanguageRustMIT LicenseMIT

Polars

rust docs Build, test and docs Gitter

Blazingly fast in memory DataFrames in Rust

Polars is a DataFrames library implemented in Rust, using Apache Arrow as backend. Its focus is being a fast in memory DataFrame library.

Polars is in rapid development, but it already supports most features needed for a useful DataFrame library. Do you miss something, please make an issue and/or sent a PR.

First run in Rust

Take a look at the 10 minutes to Polars notebook to get you started. Want to run the notebook yourself? Clone the repo and run $ cargo c && docker-compose up. This will spin up a jupyter notebook on http://localhost:8891. The notebooks are in the /examples directory.

Oh yeah.. and get a cup of coffee because compilation will take while during the first run.

First run in Python

A subset of the Polars functionality is also exposed through Python bindings. You can install them for linux with:

$ pip install py-polars

Next you can check the 10 minutes to py-polars notebook or take a look at the reference.

Documentation

Want to know what features Polars support? Check the current master docs.

Most features are described on the DataFrame, Series, and ChunkedArray structs in that order. For ChunkedArray a lot of functionality is also defined by Traits in the ops module.

Performance

Polars is written to be performant. Below are some comparisons with the (also very fast) Pandas DataFrame library.

GroupBy

Joins

Functionality

Read and write CSV | JSON | IPC | Parquet

 use polars::prelude::*;
 use std::fs::File;
 
 fn example() -> Result<DataFrame> {
     let file = File::open("iris.csv")
                     .expect("could not open file");
 
     CsvReader::new(file)
             .infer_schema(None)
             .has_header(true)
             .finish()
 }

Joins

 use polars::prelude::*;

 fn join() -> Result<DataFrame> {
     // Create first df.
     let temp = df!("days" => &[0, 1, 2, 3, 4],
                    "temp" => &[22.1, 19.9, 7., 2., 3.])?;

     // Create second df.
     let rain = df!("days" => &[1, 2],
                    "rain" => &[0.1, 0.2])?;

     // Left join on days column.
     temp.left_join(&rain, "days", "days")
 }

 println!("{:?}", join().unwrap());
 +------+------+------+
 | days | temp | rain |
 | ---  | ---  | ---  |
 | i32  | f64  | f64  |
 +======+======+======+
 | 0    | 22.1 | null |
 +------+------+------+
 | 1    | 19.9 | 0.1  |
 +------+------+------+
 | 2    | 7    | 0.2  |
 +------+------+------+
 | 3    | 2    | null |
 +------+------+------+
 | 4    | 3    | null |
 +------+------+------+

Groupby's | aggregations | pivots | melts

 use polars::prelude::*;
 fn groupby_sum(df: &DataFrame) -> Result<DataFrame> {
     df.groupby(&["a", "b"])?
     .select("agg_column_name")
     .sum()
 }

Arithmetic

 use polars::prelude::*;
 let s = Series::new("foo", [1, 2, 3]);
 let s_squared = &s * &s;

Rust iterators

 use polars::prelude::*;
 
 let s: Series = [1, 2, 3].iter().collect();
 let s_squared: Series = s.i32()
      .expect("datatype mismatch")
      .into_iter()
      .map(|optional_v| {
              match optional_v {
                  Some(v) => Some(v * v),
                  None => None, // null value
              }
          }).collect();

Apply custom closures

 use polars::prelude::*;
 
 let s: Series = Series::new("values", [Some(1.0), None, Some(3.0)]);
 // null values are ignored automatically
 let squared = s.f64()
     .unwrap()
     .apply(|value| value.powf(2.0))
     .into_series();
 
 assert_eq!(Vec::from(squared.f64().unwrap()), &[Some(1.0), None, Some(9.0)]);

Comparisons

 use polars::prelude::*;
 use itertools::Itertools;
 let s = Series::new("dollars", &[1, 2, 3]);
 let mask = s.eq(1);
 
 assert_eq!(Vec::from(mask), &[Some(true), Some(false), Some(false)]);

Temporal data types

 let dates = &[
 "2020-08-21",
 "2020-08-21",
 "2020-08-22",
 "2020-08-23",
 "2020-08-22",
 ];
 // date format
 let fmt = "%Y-%m-%d";
 // create date series
 let s0 = Date32Chunked::parse_from_str_slice("date", dates, fmt)
         .into_series();

And more...

Features

Additional cargo features:

  • pretty (default)
    • pretty printing of DataFrames
  • temporal (default)
    • Conversions between Chrono and Polars for temporal data
  • simd
    • SIMD operations
  • paquet
    • Read Apache Parquet format
  • random
    • Generate array's with randomly sampled values
  • ndarray
    • Convert from DataFrame to ndarray