Tidier.jl is a 100% Julia implementation of the R tidyverse mini-language in Julia. Powered by the DataFrames.jl package and Julia’s extensive meta-programming capabilities, Tidier.jl is an R user’s love letter to data analysis in Julia.
Tidier.jl
has three goals, which differentiate it from other data analysis
meta-packages in Julia:
-
Stick as closely to tidyverse syntax as possible: Whereas other meta-packages introduce Julia-centric idioms for working with DataFrames, this package’s goal is to reimplement parts of tidyverse in Julia. This means that
Tidier.jl
uses tidy expressions as opposed to idiomatic Julia expressions. An example of a tidy expression isa = mean(b)
. -
Make broadcasting mostly invisible: Broadcasting trips up many R users switching to Julia because R users are used to most functions being vectorized.
Tidier.jl
currently uses a lookup table to decide which functions not to vectorize; all other functions are automatically vectorized. Read the documentation page on "Autovectorization" to read about how this works, and how to override the defaults. -
Make scalars and tuples mostly interchangeable: In Julia, the function
across(a, mean)
is dispatched differently thanacross((a, b), mean)
. The first argument in the first instance above is treated as a scalar, whereas the second instance is treated as a tuple. This can be very confusing to R users because1 == c(1)
isTRUE
in R, whereas in Julia1 == (1,)
evaluates tofalse
. The design philosophy inTidier.jl
is that the user should feel free to provide a scalar or a tuple as they see fit anytime multiple values are considered valid for a given argument, such as inacross()
, andTidier.jl
will figure out how to dispatch it.
For the stable version:
] add Tidier
The ]
character starts the Julia package manager. Press the backspace key to return to the Julia prompt.
or
using Pkg
Pkg.add("Tidier")
For the newest version:
] add Tidier#main
or
using Pkg
Pkg.add(url="https://github.com/TidierOrg/Tidier.jl")
To support R-style programming, Tidier.jl is implemented using macros.
Tidier.jl currently supports the following top-level macros:
@glimpse()
@select()
,@rename()
, and@distinct()
@mutate()
and@transmute()
@summarize()
and@summarise()
@filter()
and@slice()
@group_by()
and@ungroup()
@arrange()
@pull()
@count()
and@tally()
@left_join()
,@right_join()
,@inner_join()
, and@full_join()
@bind_rows()
and@bind_cols()
@pivot_wider()
and@pivot_longer()
@drop_na()
@clean_names()
(as in R'sjanitor::clean_names()
function)
Tidier.jl also supports the following helper functions:
across()
desc()
if_else()
andcase_when()
n()
androw_number()
ntile()
lag()
andlead()
starts_with()
,ends_with()
,matches()
, andcontains()
as_float()
,as_integer()
, andas_string()
See the documentation Home page for a guide on how to get started, or the Reference page for a detailed guide to each of the macros and functions.
Let's select the first five movies in our dataset whose budget exceeds the mean budget. Unlike in R, where we pass an na.rm = TRUE
argument to remove missing values, in Julia we wrap the variable with a skipmissing()
to remove the missing values before the mean()
is calculated.
using Tidier
using RDatasets
movies = dataset("ggplot2", "movies");
@chain movies begin
@mutate(Budget = Budget / 1_000_000)
@filter(Budget >= mean(skipmissing(Budget)))
@select(Title, Budget)
@slice(1:5)
end
5×2 DataFrame
Row │ Title Budget
│ String Float64?
─────┼──────────────────────────────────────
1 │ 'Til There Was You 23.0
2 │ 10 Things I Hate About You 16.0
3 │ 102 Dalmatians 85.0
4 │ 13 Going On 30 37.0
5 │ 13th Warrior, The 85.0
See NEWS.md for the latest updates.
Is there a tidyverse feature missing that you would like to see in Tidier.jl? Please file a GitHub issue. Because Tidier.jl primarily wraps DataFrames.jl, our decision to integrate a new feature will be guided by how well-supported it is within DataFrames.jl and how likely other users are to benefit from it.