Method to assess read-result data-frame and decide which files to include

Question

Opened this issue 5 years ago · 2 comments

Determin which data files are needed to get the data we need
In order to filter data files and retain only the important ones

Answer 1 · 2020-04-14T17:58:29.000Z

Create function to read and triage datafiles and return df only if conditions are met.
This is intended to map to all the files in the directory.

Input: file path
Open file into df
Check if it's one country--one round
- length(table(pais))==1 [not more than one country]
- max(year)-min(year)<=1 [not spanning more than one year]
If test passes, return:
- dataframe
- filename

Answer 2 · 2020-04-29T14:59:47.000Z

Use files with with updated filenames
Read in only "cy" type .dta files
Get country from 3-letter country code

NAMING CONVENTION:

All lowercase 3-letter country code, followed by underscore “_”
• Use https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3
• For regional merges (more than one country), use “all” instead
4-digit year(s), separated by dash "-", followed by underscore “_"
• Use year in filename now
• For multi-year merges, use initial and final year separated with a dash, e.g. “2006-2018”
Type-indicator (2-letter, all lowercase), followed by underscore "_"
• “ts” for time series
• “rm” for regional merge
• “gm” for grand merge (multiple countries multiple years)
• “cy” one country one year
• “ti” for technical information
• “qd” for questionnaire document
• “cb” for code book
• “cl” for change log