/201819A_cityu_com5507

This repository documents the course materials of my course COM5507 @ CityU in 2018 Fall.

MIT LicenseMIT

201819A COM5507 Social Media Data Acquisition and Processing

  • #test v0 @ 20181009

  • This repository was created in 2018 Fall. It stores the course documents of a postgraduate-level course, COM5507 Social Media Data Acquisition and Processing, for the Master of Arts in Communication and New Media program (MACNM) @ City University of Hong Kong (CityU).

  • #Data_science_101 | #Python | #automated | #web_data_collection | #opendata | #web_scraping | #API | #pandas | #numpy | #tm | #sna | #dataviz | #macnm | #cityucom_10thanniversary

Course Instructor

Objectives

This course aims to introduce the fundamental knowledge and hands-on skills of big data analytics in the field of media and communication. Special focus will be placed on techniques for searching, collecting, analyzing, interpreting, and visualizing data. Technical details include, but not limited to, web crawling, data storage, data analysis, text mining, social network analysis, and data visualization, based on open source software packages. Through a variety of teaching learning activities, such as class demonstrations, individual exercises, quizzes, collaborative projects, and guest lectures, by the end of the semester, students are expected to become capable to collect big data from different data sources, i.e., social media harvesting, web scraping, online archiving or indexing data retrieving, with open source software packages. Students are also expected to produce socially, culturally, or commercially meaningful data-driven narrative outputs, such as data-driven journalistic report, data visualization, data-driven business analysis, and computational social science research reports. Meanwhile, critical reflection on the overuse and abuse of big data and relevant ethical and legal controversies will be discussed throughout the semester as well.

Course Structure

This course contains a total of 13 classes (weeks). Each class lasts for 3 hours. There are 11 lectures (including in-class assignments and tutorials), 1 project consultation week, and 1 presentation week.

The lectures are divided into four units, and several additional workshops, plus a presentation week:

  • Unit 1: Data science fundamentals and basic Python programming (week 1 – 4)
  • Unit 2: Automated web data collection (week 5 – 8)
  • Unit 3: Data processing and data management (week 9 – 11)
  • Unit 4: Data exploration (week 11 - 12)
  • Project implementation & presentation

Course Syllabus (weekly teaching plan)

Week Content Tools, packages, & tech details Documents
Week 1 Introduction: Media and communication in the digital age Tools installation (Python; Anaconda, Jupyter Notebook; Git and GitHub; Markdown language) slides
Week 2 Python in action: A command-liner's perspective Python (program execution, variables, expressions, data structure); command line interface slides code examples
Week 3 Python in action: in an interactive notebook Python (functions, control flow statements, errors and debugging); Jupyter Notebook, Numpy, Pandas slides code examples
Week 4 Data science pipeline & project implementation (1) Data scientists' workflow, data-driven investigation slides
Week 5 Web scraping ep. 1 Web technologies (HTTP, HTML, CSS), Requests, BeautifulSoup slides
Week 6 Web scraping ep. 2 Data sources, web scraping pipelines, Requests, BeautifulSoup, Pandas slides
Workshop Developing web crawlers Scrapy, Selenium demonstration
Week 8 Mining the social web Web data formats (JSON, XML), Regex, API slides
Workshop Case study: An integrated workflow API, Cloud (Amazon), Pandas demonstration
Week 9 Data processing ep. 1: numeric data Pandas slides
Week 10 Data processing ep. 2: text analysis fundamental Pandas, Matplotlib slides
Week 11 Data exploration Matplotlib slides
Week 12 Project implementation (2) Integrated data-driven storytelling slides
Week 13 Group project presentation /// Representative student works

About the Instructor

  • Xinzhi Zhang (MA., Ph.D., City University of Hong Kong) is a Research Assistant Professor at the Department of Journalism of Hong Kong Baptist University. His research interests include comparative political communication, new media and social change, and emerging media and the sociology of news. He is also an observer of computational social science and digital humanities. His research work has appeared in peer-reviewed journals such as International Political Science Review, Computers in Human Behavior, International Journal of Communication, and Digital Journalism. He currently serves as the Programme Director of Data and Media Communication concentration, an interdisciplinary undergraduate concentration on data science and data-driven investigation and storytelling at HKBU.