/datalab_challenge_2

Primary LanguagePythonMIT LicenseMIT

Data Lab (CSYS 395, Spring 2020) Challenge #2

  • Assigned 2020/02/18
  • Due 2020/03/10
  • Team 3:
    • Anoob Prakash
    • Jessica Cole
    • Elizabeth Espinosa
    • Erik Brown
    • Samuel Rosenblatt
    • Colin Van Oort

Problems

  1. Parse the SMAC data files.
  2. Reproduce the analyses from the SMAC paper.
    • Number of Community Visits time series
    • Burial by Chiefdom and burial type time series
    • Community By-laws time series
  3. Identify surprising chiefdoms: Regions that had a lot more or a lot less cases than neighbors. Do they stand-out in social mobilization data?
  4. Continue the exploration in interesting directions:
    • Compare the attack rate in people to the attack rate over chiefdoms.
    • Is there any evidence that misinformation or distrust in the intervention lead to lower rates of reporting, safe burials and referrals?
    • Are social mobilizers from different organizations getting similar ratings in the different evaluation metrics?

Solutions

Resources

Notes on Sierra Leone:

Repo Notes:

  • When looking at the column discrepancies that are output by etl.py, it can be useful to combine multiple related discrepancy files with the following command:
    cat *.json | sed 's/,//g' | grep -v -E '\[|\]' | sort | uniq 
    In particular, you can reduce the globbing to a category of interest, such as cat *Chiefdoms.json ....