- Date: September 12-16, 2022
- Time: 09:00-12:00 and 13:00-16:00 (including one 15 min break per session)
Milena Tsvetkova is Assistant Professor of Computational Social Science at the Department of Methodology at the London School of Economics and Political Science. She completed her PhD in Sociology at Cornell University and postdoctoral training at the Oxford Internet Institute. In her research, she uses large-scale web-based experiments, network analysis of online data, and agent-based modeling to investigate fundamental social phenomena such as cooperation, social contagion, segregation, and inequality.
Patrick Gildersleve is an LSE Fellow in Computational Social Science in the Department of Methodology at the London School of Economics and Political Science. Patrick graduated with a Masters in Physics from the University of Oxford, before completing his PhD at the Oxford Internet Institute in 2021. In his PhD research, he worked on studying the intersection of news media and Wikipedia. Patrick analysed how current events are recorded and accessed on the online collaborative encyclopaedia as well as its implications for theories of news values, newsworthiness, and collective attention dynamics. He has continued this work with an expanded research agenda around popularity and collective memory across platforms online.
The course provides an introduction to the basic computational tools, skills, and methods used in Computational Social Science using Python. Python is the most popular programming language for data science, used widely in both academia and the industry. Students will learn to use common workflow and collaboration tools, design, write, and debug simple computer programs, and manage, summarize, and visualize data with common Python libraries. The course will employ interactive tutorials and hands-on exercises using real social data. Participants will work independently and in groups with guidance and support from the lecturers. The practical exercises are designed to demand more autonomy and initiative as the course progresses over the five days, culminating in an open-ended group project in the last afternoon session.
This is an introductory course and no prior experience with programming is required. Basic understanding of statistics and some scripting experience (e.g., from building web pages or statistical analysis programs such as Stata) will be helpful but not needed.
Participants will find the course useful if they:
- Have no or limited technical and computational background
- Have background in one of the social sciences (sociology, political science, psychology, etc.)
- Would like to pursue research or professional career in computational social science or social data science (e.g., in academia, think tanks, government, NGOs, social media companies, tech startups)
By the end of the course participants will:
- Possess an understanding of the tools, methods, tasks, and goals of Computational Social Science
- Design procedures and algorithms to solve data analysis tasks
- Write simple programs in Python
- Work confidently with pandas, matplotlib, seaborn, and other popular Python modules and libraries for data science
- Use bash, Jupyter Notebook, and GitHub to write, run, collaborate on, and share programming code
The course will consist of two three-hour-long sessions. The morning session will use interactive instruction to introduce participants to the topic, demonstrate the new methods, and facilitate discussion. The afternoon session will make use of guided hands-on exercises with real-world data to practice the new material. Participants will work individually, in pairs, and in groups and the lecturers will be available throughout both sessions for consultation and support.
Participants require a laptop computer with Anaconda and git installed.
- Lazer, D. et al. (2009). Computational social science. Science, 323(5915), 721–723.
- Salganik, M. J. (2019). Bit by Bit: Social Research in the Digital Age.
- Various authors. (2021). Special collection on Computational Social Science. Nature 595, 149–222.
- What is CSS?
- Data, methods, and questions
- Accountability, reproducibility, and ethics
- Setting up your workflow
- Installing Python with Anaconda
- Introduction to Jupyter Notebooks
- Introduction to Bash and GitHub
- Introduction to programming with Python
- Scalar data types, operators, and expressions
- Variable assignment, printing, and comments
- Non-scalar data types, indexing, and slicing
- List and string methods
Recommended Literature:
- Matthes, Eric. Python Crash Course Cheat Sheet.
- GitHub Git Cheat Sheet
- GitHub Tutorials
- Understanding control flow
- Conditionals
- Iteration
- List comprehensions
- Functions
- Modules and libraries
- Abstraction and decomposition
- Procedural programming with functions
- Object-oriented programming with classes
Recommended Literature:
- Handling social data
- Ethics of data access
- Reading and writing common file types
- More complex data types: time and dates, Unicode, etc.
- Scraping web data
- Inspecting webpages
- Parsing static HTML with BeautifulSoup
- Scraping dynamic pages with Selenium
- JSON and working with APIs
- The Anatomy of APIs
- Authentication
- Pagination
Recommended Literature:
- BeautifulSoup Documentation
- Selenium Documentation
- Ruths, D., & Pfeffer, J. (2014). Social media for large studies of behavior. Science, 346(6213), 1063-1064.
- Introduction to pandas
- Creating DataFrames
- Accessing and filtering data
- Computing summary statistics
- Reading and writing data
- Manipulating pandas DataFrames
- Handling different data types
- Combining data from different tables
- Applying functions to DataFrames
- Creating basic plots using pandas
- Machine learning with scikit-learn
- Machine learning (a very brief intro)
- Scikit-learn
- Training data vs test data
- Random forests
- Feature importance
- Hyper-parameter tuning
Recommended Literature:
- Basics of visualisation
- Understanding plot elements
- Choosing the right chart
- Principles of colour
- Approaches going forward
- Plotting data with Matplotlib and Seaborn
- Basic plotting in Python
- Pyplot vs the object-oriented approach
- Customising plots and figures
- Attractive plots with Seaborn
- Analysis of non-rectangular data
- Network analysis with NetworkX
- Text analysis with NLTK
Recommended Literature:
- Guttag, John V. (2016). Introduction to Computation and Programming Using Python: With Application to Understanding Data. MIT Press.
- McLevey, John. (2021). Doing Computational Social Science: A Practical Introduction. Sage.