/scas_db

SCAS Database Scrapper User Mannual

SCAS Database Scraper User Mannual

Author: Gabriel Zhang

Email: gzhang@compasslexecon.com

Ext.: 20639

I. About this repository:

This repository contains the user manual of SCAS Database Scraper written by Gabriel Zhang for Compass Lexecon. The scraper is completely written in Python 3.6.

II. Disclaimer

Due to confidentiality, code would not be published or released from this repository, and this manual is written soley for Compass Lexecon staffs who would be using this scraper to obtain Securities Class Action Settlements data by ISS Link.

III. Preparation

Step 1: Install Python for Windows

  • Download Python 3.x.x for Windows installation file from here, the name of the installation file downloaded to user's machine would typically look something like python-3.x.x.exe.

    • On July 25th, 2017, the newest version of Python for Windows is version 3.6.2
  • Double click the installation file to install Python

    • Check the box ''Add Python 3.6 to PATH''.
    • Left-click ''Install now'', and Python would be installed in designated location, such as C:\Users\gzhang\AppData\Local\Programs\Python\Python36-32 in my machine.

Step 2: Install Required Packages

  • After downloading Python3, open a Windows Command Prompt. Copy and paste the command below and press enter. Windows will update all built-in package.

    python -m pip install -U pip setuptools
  • To install required pacakge, enter the following command to Windows Command Prompt. This command will install the latest version of a module and its dependencies from the Python Package Index.

    python -m pip install selenium openpyxl

Step 3: Download Chrome Browser and Chrome Driver

  • Download Chrome browser here.
  • Download Chrome driver here.
    • Click on Latest Release: ChromeDriver X.XX
    • Download chromedriver_win32.zip
    • Unzip the file to get chromedriver.exe
    • Drag and drop chromedriver.exe to user's Python3 home directory's subdirectory called Scripts.
      • For example, mine would be:
      C:\Users\gzhang\AppData\Local\Programs\Python\Python36-32\Scripts
      • Reference here
      • Friendly reminder: You only need to do this once in a machine.

IV. SCAS DB Scraper

NOTICE: in this manual, I assume the scraper file locates in C:\Workdata\scas_scraper.py

  • Determine Start Filing Date and End Filing Date, and enter command below to Windows Command Prompt to run the scraper.

    • For example, if a user want to scrap all case profiles between 01/01/2016 and 07/19/2017, use the following command.
      python C:\Workdata\scas_scraper.py 01/01/2016 07/19/2017
    • Friendly Reminder: although the scraper is capable of scrapping data of any time interval from 1 day to 10 years or more, it is recommended to scrap case profiles within shorter time interval for faster job processing speed, and my recommendation would be 1 year.
  • When scrapping, the Windows Command Prompt instance will inform user of the scrapping progress.

V. Check Scraped Files

  • After scrapping completed successfully, go to C:\Workdata, and user would be able to see a new directory called case_profiles. Double-click to enter the directory, and user would see sub-directories in format of MMDDYYYY-MMDDYYYY (For example, 01012016-07192017), and all scraped case profiles are stored in individual Excel format.