🔥 Blazingly fast DataFrames for Ruby, powered by Polars
Add this line to your application’s Gemfile:
gem "polars-df"
This library follows the Polars Python API.
Polars.read_csv("iris.csv")
.lazy
.filter(Polars.col("sepal_length") > 5)
.groupby("species")
.agg(Polars.all.sum)
.collect
You can follow Polars tutorials and convert the code to Ruby in many cases. Feel free to open an issue if you run into problems.
From a CSV
Polars.read_csv("file.csv")
# or lazily with
Polars.scan_csv("file.csv")
From Parquet
Polars.read_parquet("file.parquet")
From Active Record
Polars.read_sql(User.all)
# or
Polars.read_sql("SELECT * FROM users")
From a hash
Polars::DataFrame.new({
a: [1, 2, 3],
b: ["one", "two", "three"]
})
From an array of hashes
Polars::DataFrame.new([
{a: 1, b: "one"},
{a: 2, b: "two"},
{a: 3, b: "three"}
])
From an array of series
Polars::DataFrame.new([
Polars::Series.new("a", [1, 2, 3]),
Polars::Series.new("b", ["one", "two", "three"])
])
Get number of rows
df.height
Get column names
df.columns
Check if a column exists
df.include?(name)
Select a column
df["a"]
Select multiple columns
df[["a", "b"]]
Select first rows
df.head
Select last rows
df.tail
Filter on a condition
df[Polars.col("a") == 2]
df[Polars.col("a") != 2]
df[Polars.col("a") > 2]
df[Polars.col("a") >= 2]
df[Polars.col("a") < 2]
df[Polars.col("a") <= 2]
And, or, and exclusive or
df[(Polars.col("a") > 1) & (Polars.col("b") == "two")] # and
df[(Polars.col("a") > 1) | (Polars.col("b") == "two")] # or
df[(Polars.col("a") > 1) ^ (Polars.col("b") == "two")] # xor
Basic operations
df["a"] + 5
df["a"] - 5
df["a"] * 5
df["a"] / 5
df["a"] % 5
df["a"] ** 2
df["a"].sqrt
df["a"].abs
Rounding
df["a"].round(2)
df["a"].ceil
df["a"].floor
Logarithm
df["a"].log # natural log
df["a"].log(10)
Exponentiation
df["a"].exp
Trigonometric functions
df["a"].sin
df["a"].cos
df["a"].tan
df["a"].asin
df["a"].acos
df["a"].atan
Hyperbolic functions
df["a"].sinh
df["a"].cosh
df["a"].tanh
df["a"].asinh
df["a"].acosh
df["a"].atanh
Summary statistics
df["a"].sum
df["a"].mean
df["a"].median
df["a"].quantile(0.90)
df["a"].min
df["a"].max
df["a"].std
df["a"].var
Group
df.groupby("a").count
Works with all summary statistics
df.groupby("a").max
Multiple groups
df.groupby(["a", "b"]).count
Add rows
df.vstack(other_df)
Add columns
df.hstack(other_df)
Inner join
df.join(other_df, on: "a")
Left join
df.join(other_df, on: "a", how: "left")
One-hot encoding
df.to_dummies
Array of rows
df.rows
Hash of series
df.to_h
CSV
df.to_csv
# or
df.write_csv("file.csv")
Parquet
df.write_parquet("file.parquet")
You can specify column types when creating a data frame
Polars::DataFrame.new(data, columns: {"a" => Polars::Int32, "b" => Polars::Float32})
Supported types are:
- boolean -
Boolean
- float -
Float64
,Float32
- integer -
Int64
,Int32
,Int16
,Int8
- unsigned integer -
UInt64
,UInt32
,UInt16
,UInt8
- string -
Utf8
,Categorical
- temporal -
Date
,Datetime
,Time
,Duration
Get column types
df.schema
For a specific column
df["a"].dtype
Cast a column
df["a"].cast(Polars::Int32)
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/polars-ruby.git
cd polars-ruby
bundle install
bundle exec rake compile
bundle exec rake test