/tidyusoc

Tools for compiling multiple waves of Understanding Society into one tidy data frame.

Primary LanguageROtherNOASSERTION

tidyusoc

R-CMD-check

Installation

You can install the development version of tidyusoc from GitHub with:

# install.packages("devtools")
devtools::install_github("wklimowicz/tidyusoc")

Setup

After downloading the Understanding Society Data from the UK Data Service website, run:

library(tidyusoc)

# Convert to Rds to save space and time
usoc_convert(usoc_directory = "spss/spss25",
             new_directory = "rds",
             filter_files = "indresp")

# Compile the INDRESP files in the rds folder
usoc <- usoc_compile(directory = "rds")

A small set of core variables is included by default:

Variable Description
pidp Personal ID
age Age
race Race/Ethnicity
sex Sex
country Country
gor_dv Government Office Region
qfhigh_dv Highest Qualification (Detailed, UKHLS Only)
hiqual_dv Highest Qualification
jbstat Job Status
jbsoc00_cc Occupation
fimnlabgrs_dv Labour Income

To add more variables, use the extra_mappings function. For variables that change over time use pick_var. For example, life satisfaction is:

  • lfsato in waves 6-10 and 12-18 of the BHPS.
  • sclfsato in waves 1-11 of the UKHLS.

The variable BMI (bmi_dv) doesn’t change over time, so it’s passed as a string.

custom_mappings <- function(cols) {

    life_sat <- pick_var(c("sclfsato", "lfsato"), cols)

    custom_variables <- tibble::tribble(
        ~usoc_name, ~new_name, ~type,
        life_sat, "life_satisfaction", "factor",
        "bmi_dv", "bmi_dv", "numeric"
    )

    return(custom_variables)

}

usoc <- usoc_compile("rds", extra_mappings = custom_mappings)

A good place to start looking for variables is the Understanding Society Variable Search, and the Documentation.

Compiling different surveys.

By default, these scripts compile the indresp file - you can change this with the file argument. For example, to compile the youth response files:

usoc_convert(usoc_directory = "spss/spss25",
             new_directory = "rds",
             filter_files = "youth")

usoc <- usoc_compile(directory = "rds", file = "youth")

Using the data across multiple projects.

Adding an environment variable called DATA_DIRECTORY pointing at a folder, and using the save_to_folder argument saves an fst of the final dataset, so it can be loaded from anywhere.

library(tidyusoc)

usoc_compile(directory = "rds", save_to_folder = TRUE)

usoc <- usoc_load()

To read any of the other compiled files change the file argument:

usoc <- usoc_load(file = "youth")