Issues with country data files
Closed this issue · 4 comments
aliforgetti commented
- Columns in data files are very different and missing columns present in
questions_and_categories
lookup table.
aliforgetti commented
atg_2016_cy_lan_p
is has country that is not present in the country lookup table. This country was numbered as 33 and we don't seem to have 33 anywhere.
aliforgetti commented
- Inconsistency in
idnum
column in data files. Either different types or named differently.
maitagorri commented
List of problematic files:
Incorrect Year In File:
- bra_2007_cy_lan_p.dta
- guy_2008_cy_lan_p.dta
- ury_2007_cy_lan_p.dta
- ven_2007_cy_lan_p.dta
- atg_2016_cy_lan_p.dta ----> not sure what atg is
and country does not exist
in lookups. - gtm_2006_cy_lan_p.dta ----> throws weird error when trying
to read the file
Multi year files, unsure how to label year and "Wave" in these
ven_2016-2017_cy_lan_p
ecu_2016-2017_cy_lan_p
maitagorri commented
Suggested solutions:
- Use new data files with updated names, stored in Box as
data_v6
- Where filename and contents disagree on year, use filename year as
wave
, and content year asyear
- ATG is Antigua and Barbuda; I shared an updated country/ISO/LAPOP code table in Slack (not sure what you are using, but it would be great if you can update that resource with what I shared https://datasciencetip.slack.com/files/U010GMT8J2X/F013M8ZDGSH/iso_3166-1-alpha3_country-codes.csv)
- The GTM file seems okay; I'm having trouble reading it with
haven
, butread.dta13
from the packagereadstata13
does the job - In multi-year files, use content year as
year
, and for now the first filename year aswave