This package has merged with the abscorr
package by the R Users’
Network for Australian Public Policy. The merged package is called
strayr
. Find the merged package at runapp’s
GitHub.
strayr is a simple tool to wrangle messy Australian state names and/or abbreviations into a consistent format.
Install from GitHub with:
# if you don't have devtools installed, first run:
# install.packages("devtools")
devtools::install_github("mattcowgill/strayr")
Let’s start with a character vector that includes some misspelled State names, some correctly spelled state names, as well as some abbreviations both malformed and correctly formed.
x <- c("western Straya", "w. A ", "new soth wailes", "SA", "tazz", "Victoria",
"northn territy")
To convert this character vector to a vector of abbreviations for State
names, simply use the strayr()
function:
library(strayr)
strayr(x)
#> [1] "WA" "WA" "NSW" "SA" "Tas" "Vic" "NT"
If you want full names for the states rather than abbreviations:
strayr(x, to = "state_name")
#> [1] "Western Australia" "Western Australia" "New South Wales"
#> [4] "South Australia" "Tasmania" "Victoria"
#> [7] "Northern Territory"
By default, strayr()
uses fuzzy or approximate string matching to
match the elements in your character vector to state
names/abbreviations. If you only want to permit exact matching, you can
disable fuzzy matching. This means you will never get false matches, but
you will also fail to match misspelled state names or malformed
abbreviations; you’ll get an NA
if no match can be found.
strayr(x, fuzzy_match = FALSE)
#> [1] NA NA NA "SA" NA "Vic" NA
If your data is in a data frame, strayr()
works well within a
dplyr::mutate()
call:
x_df <- data.frame(state = x, stringsAsFactors = FALSE)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
x_df %>%
mutate(state_abbr = strayr(state))
#> state state_abbr
#> 1 western Straya WA
#> 2 w. A WA
#> 3 new soth wailes NSW
#> 4 SA SA
#> 5 tazz Tas
#> 6 Victoria Vic
#> 7 northn territy NT
This package includes the auholidays
dataset from the Australian
Public Holidays Dates Machine Readable
Dataset
as well as a helper function is_holiday
:
str(auholidays)
#> tibble [779 × 3] (S3: tbl_df/tbl/data.frame)
#> $ Date : Date[1:779], format: "2021-01-01" "2021-01-26" ...
#> $ Name : chr [1:779] "New Year's Day" "Australia Day" "Canberra Day" "Good Friday" ...
#> $ Jurisdiction: chr [1:779] "ACT" "ACT" "ACT" "ACT" ...
is_holiday('2020-01-01')
#> [1] TRUE
is_holiday('2019-05-27', jurisdictions=c('ACT', 'TAS'))
#> [1] TRUE
h_df <- data.frame(dates = c('2020-01-01', '2020-01-10'))
h_df %>%
mutate(IsHoliday = is_holiday(dates))
#> dates IsHoliday
#> 1 2020-01-01 TRUE
#> 2 2020-01-10 FALSE