Data Lab (CSYS 395, Spring 2020) Challenge #2
- Assigned 2020/02/18
- Due 2020/03/10
- Team 3:
- Anoob Prakash
- Jessica Cole
- Elizabeth Espinosa
- Erik Brown
- Samuel Rosenblatt
- Colin Van Oort
Problems
- Parse the SMAC data files.
- Reproduce the analyses from the SMAC paper.
- Number of Community Visits time series
- Burial by Chiefdom and burial type time series
- Community By-laws time series
- Identify surprising chiefdoms: Regions that had a lot more or a lot less cases than neighbors. Do they stand-out in social mobilization data?
- Continue the exploration in interesting directions:
- Compare the attack rate in people to the attack rate over chiefdoms.
- Is there any evidence that misinformation or distrust in the intervention lead to lower rates of reporting, safe burials and referrals?
- Are social mobilizers from different organizations getting similar ratings in the different evaluation metrics?
Solutions
Resources
Notes on Sierra Leone:
- Divided into 5 administrative regions
- Further divided into 16 Districts
- Further divided into 186 chiefdoms
- Chiefdoms may be cut into "Sections" (ref 1, ref2)
- District maps can be found here
Repo Notes:
- When looking at the column discrepancies that are output by
etl.py
, it can be useful to combine multiple related discrepancy files with the following command:In particular, you can reduce the globbing to a category of interest, such ascat *.json | sed 's/,//g' | grep -v -E '\[|\]' | sort | uniq
cat *Chiefdoms.json ...
.