vanderbilt-data-science/lapop-dashboard

Method to assess read-result data-frame and decide which files to include

Opened this issue · 2 comments

Determin which data files are needed to get the data we need
In order to filter data files and retain only the important ones

Create function to read and triage datafiles and return df only if conditions are met.
This is intended to map to all the files in the directory.

  1. Input: file path
  2. Open file into df
  3. Check if it's one country--one round
    • length(table(pais))==1 [not more than one country]
    • max(year)-min(year)<=1 [not spanning more than one year]
  4. If test passes, return:
    • dataframe
    • filename

Use files with with updated filenames
Read in only "cy" type .dta files
Get country from 3-letter country code

NAMING CONVENTION:

  1. All lowercase 3-letter country code, followed by underscore “_”
    • Use https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3
    • For regional merges (more than one country), use “all” instead
  2. 4-digit year(s), separated by dash "-", followed by underscore “_"
    • Use year in filename now
    • For multi-year merges, use initial and final year separated with a dash, e.g. “2006-2018”
  3. Type-indicator (2-letter, all lowercase), followed by underscore "_"
    • “ts” for time series
    • “rm” for regional merge
    • “gm” for grand merge (multiple countries multiple years)
    • “cy” one country one year
    • “ti” for technical information
    • “qd” for questionnaire document
    • “cb” for code book
    • “cl” for change log