To access course notebooks in Binder (temporary coding environment):
Note: These notebooks are available only during class time.
To access and download course materials, including notebooks, at any time:
- Introduction to course outline and set-up (GitHub, Jupyter notebooks)
- Overview of useful Python libraries and capabilities for journalists
- Automation of data cleaning, wrangling and analysis
- Statistical analysis
- Visualization
- ML/AI
- APIs
- Other uses
- Data analysis with the Pandas library
- Introduction of course group projects
- Advanced analysis (grouping, functions, and more)
- Merging and joining datasets
- How to diagnose dirty data
- Reformatting and cleaning dirty data
- Automating data pipelines
- Documenting your steps for replication
- Basic statistical concepts for journalists - looking at the relationship between variables
- Correlation
- Regression
- Scatterplots
- Examples of statistical analysis in journalism
- Statistical analysis and plots using Python
- Introducing the Matplotlib library
- Making different types of charts such as bar charts, line charts and maps
- Formatting charts with color and text
- Adding interactivity using Plotly
- Examples of visualization in journalism
- Exporting charts for publication
Assignment for the course: the 2020 Class students: You will work in 9 groups, the same groups you were in for the Data Journalism and Visualization course with Prof. Herzog.
You will use the same data used in the previous course, but this time you will clean, prepare, analyze and visualize the data in Python using Jupyter notebooks. You may also bring in additional datasets into your project, such as population or income data, that can help you do some deeper analysis.
The Python notebook will be graded on:
- Reproducibility: Make sure you note your steps and what each one does, and that the steps can be reproduced
- Deeper analysis: Join/merge additional data, create an automated pipeline, reformat/clean the data, do a statistical analysis, or anything that takes your analysis further than the last time
- Conclusions/reporting questions: What story could you create from this data? What questions would you try to answer?
- Challenges: List any challenges you overcame with the data
At the end of the class, each student team will submit its work before 5:00pm (Beijing Time) on Friday, April 14 to the study principal, who will upload all the works to Prof. Carol Zhang’s Baidu drive. The submission must include your Python notebook with the above components. The group members’ Qualtrics peer evaluation results have to be sent to Prof. Carol Zhang by 12 am, midnight, on April 14. Anybody submits the Qualtrics late will be deducted 1 point; but Anybody who submits the Qualtrics later than 10am April 15 or eventually will not submit the Qualtrics will lose 5 points. Ms. Malan, Ms. Yanchen Liu and Dr. Ernest Zhang will grade each group’s production.
More details:
- Use the same data from your project in data journalism class
- Bring in additional data for context, perform statistical analysis or visualizations using Python that help you do a deeper analysis than previously done
- Do all analysis in a Python notebook and turn in the notebook for grading
-
Think Python, Second Edition, by Allen B. Downey (pdf version)
-
Think Stats, Second Edition, by Allen B. Downey (pdf version)
-
Coding for Journalists, an IRE course teaching Python developed by former training director Alex Richards