/strayr

R package with historical Australian data and useful wrangling functions

Primary LanguageR

strayr

codecov status R build status Lifecycle: superseded

This package has merged with the abscorr package by the R Users’ Network for Australian Public Policy. The merged package is called strayr. Find the merged package at runapp’s GitHub.

Overview

strayr is a simple tool to wrangle messy Australian state names and/or abbreviations into a consistent format.

Installation

Install from GitHub with:

# if you don't have devtools installed, first run:
# install.packages("devtools")
devtools::install_github("mattcowgill/strayr")

Examples

Let’s start with a character vector that includes some misspelled State names, some correctly spelled state names, as well as some abbreviations both malformed and correctly formed.

x <- c("western Straya", "w. A ", "new soth wailes", "SA", "tazz", "Victoria",
       "northn territy")

To convert this character vector to a vector of abbreviations for State names, simply use the strayr() function:

library(strayr)
strayr(x)
#> [1] "WA"  "WA"  "NSW" "SA"  "Tas" "Vic" "NT"

If you want full names for the states rather than abbreviations:

strayr(x, to = "state_name")
#> [1] "Western Australia"  "Western Australia"  "New South Wales"   
#> [4] "South Australia"    "Tasmania"           "Victoria"          
#> [7] "Northern Territory"

By default, strayr() uses fuzzy or approximate string matching to match the elements in your character vector to state names/abbreviations. If you only want to permit exact matching, you can disable fuzzy matching. This means you will never get false matches, but you will also fail to match misspelled state names or malformed abbreviations; you’ll get an NA if no match can be found.

 strayr(x, fuzzy_match = FALSE)
#> [1] NA    NA    NA    "SA"  NA    "Vic" NA

If your data is in a data frame, strayr() works well within a dplyr::mutate() call:

 x_df <- data.frame(state = x, stringsAsFactors = FALSE)

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
 x_df %>% 
   mutate(state_abbr = strayr(state))
#>             state state_abbr
#> 1  western Straya         WA
#> 2           w. A          WA
#> 3 new soth wailes        NSW
#> 4              SA         SA
#> 5            tazz        Tas
#> 6        Victoria        Vic
#> 7  northn territy         NT

Australian Public Holidays

This package includes the auholidays dataset from the Australian Public Holidays Dates Machine Readable Dataset as well as a helper function is_holiday:

str(auholidays)
#> tibble [779 × 3] (S3: tbl_df/tbl/data.frame)
#>  $ Date        : Date[1:779], format: "2021-01-01" "2021-01-26" ...
#>  $ Name        : chr [1:779] "New Year's Day" "Australia Day" "Canberra Day" "Good Friday" ...
#>  $ Jurisdiction: chr [1:779] "ACT" "ACT" "ACT" "ACT" ...


is_holiday('2020-01-01')
#> [1] TRUE
is_holiday('2019-05-27', jurisdictions=c('ACT', 'TAS'))
#> [1] TRUE

h_df <- data.frame(dates = c('2020-01-01', '2020-01-10'))

h_df %>%
  mutate(IsHoliday = is_holiday(dates))
#>        dates IsHoliday
#> 1 2020-01-01      TRUE
#> 2 2020-01-10     FALSE