/edgar_shortcourse

Scraping EDGAR Short Python Course

Primary LanguageJupyter NotebookMIT LicenseMIT

Scraping EDGAR Short Python Course

This course was designed off the top of my head for a class of 3 students that met once a week for three weeks (i.e. all design decisions were made accordingly). Each week's meeting was 45 minutes, mostly explaning the intent of the next week's lesson. Students were expected to go through the notebook on their own time, get familiar with the concepts, and execute the 'homework' on their own.

Lessons

1: Installation

Walk through installing Python on your computer, with optional software/packages to install.

Outline:

  1. Installing python
  2. Installing git
  3. Installing VSCode
  4. Installing pyEDGAR

2: Python

Walk through the basics of Python, and end with extracting simple count data from text with regular expressions.

Outline:

  1. Syntax basics (strings, variables, lists, dicts, etc.)
  2. Program control logic (if statements, for loops, etc.)
  3. Functions
  4. Reading files
  5. Regular expressions
  6. Homework on analysing text data (answers)

3: Scraping

Introduce EDGAR, and the library to download/analyze EDGAR filings (pyEDGAR). View the data, introduce the basics of HTML (BeautifulSoup), typical filing format, and extracting data from the DOM.

Outline:

  1. EDGAR (and pyEDGAR to interact with it)
  2. Filing formats
    1. Plaintext
    2. HTML
  3. Homework on analysing HTML documents (answers)

3: Bulk Scraping

Introduce DataFrames, and looping over them. Provide simple scraping loop structure for convenience. Close with example of parallelization using ipyparallel.

Outline:

  1. DataFrames
  2. Looping thereover
  3. Scraping loop framework
  4. Result aggregation and saving to disk
  5. Parallelization example