/sgsss-web-scraping-for-social-scientists-2024

Training materials associated with the SGSSS Summer School 2024 course on web scraping for social scientists

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Web scraping for social scientists

Introduction

There is an unprecedented amount of information on the internet that could usefully be harvested in order to build social science research datasets.

This half-day course will showcase suitable techniques for web scraping.

The value, logic and process of capturing data stored on websites will be described in detail, and practical examples and exercises will be demonstrated using the Python programming language.

It is most suited to empirical social science researchers but will be of value to researchers from a wide range of disciplines (e.g., digital humanities).

Course materials

This repository houses the materials underpinning a half-day SGSSS course on web scraping run by Dr Diarmuid McDonnell, University of the West of Scotland. The course was first run on 2024-06-05.

Programme

The course programme can be viewed here.

Materials

The training materials can be found in the following folders:

  • code - Jupyter Notebooks containing executable Python code for the web scraping lessons.
  • installation - Guidance on installing Python Jupyter Notebooks.
  • presentations - PDF versions of the course lectures.
  • reading - lists of interesting and relevant web scraping online articles.

Acknowledgements

I am grateful to the Scottish Graduate School of Social Sciences (SGSSS) for funding this course and its continued committment to high quality methods training for social scientists.

Further information

Please do not hesitate to get in contact if you have queries, criticisms or ideas regarding these materials: Dr Diarmuid McDonnell