# Code runs on python 3 or up
# Install required packages
pip install -r requirements.txt
#run the following file to create sqlite3 database and populate it with required data
#database file already added in project, ignore following file to run app directly
python3 processdata.py
# Visualize date
python3 app.py
-
Section 1: Read data from yfinance api: refer to get_historical_info_single in dataprep.py
-
Section 2: Extract, Load and Transform Data
- processdata.py handels all ETL process needed to load, transform and extract data to database.db (SQLite3 Database we will be using for the rest of the process)
-
Section 3:
- Calculate daily value for index, refer to create_daily_index_open_close function in dataprep.py
- Weighted Sector view for given date, refer to function return_sector_open_close_volume in apphelper.py
-
Section 4 Visualize: Run app.py to run Dash Application
- App supports following visulization: Total index view from 2018 - August 2021, Given input date, sector distribution on relative weight basis
-
dbutils.py: Database utility function to query and write against database. Current code hardcodes database location in class, should be modified to pass another location
-
dataprep.py: Utility functions used to clean up data. Methods included:
- getting historical pricing data from yfinance api [Extract]
- processing historical pricing information for given ticker to preserver required information [Transform]
- Function to create required tables in database with appropriate schema [Load]
- Wrapper class to process and write data to database [ETL]
- Function to populate database with index value calculation [Load]
-
apphelper.py: Utility functions used to query data from database and pass information to app server for display. Contains the following queries:
- get_index_open_close: return the price weighted value of index open and close for input date
- return_sector_open_close_volume: returns the percent weight of a sector contributing to the total index value for open and close on the given date, along with sector wise average volume
- get_total_index: returns close value of index across all the dates present in the database
- ticker_info: Table to store attributes relating to a given ticker
Column | Datatype | Constraint |
---|---|---|
ticker | text | Primary Key |
isin | text | Primary Key |
sector | text | |
company_name | text |
- historical_price: Table to store historical price of tickers starting 2018-present
Column | Datatype | Constraint |
---|---|---|
ticker | text | Primary Key |
date | text | Primary Key |
open | real | |
close | real | |
volume | real |
- index_value: Table to store calculated historical index open and close
Column | Datatype | Constraint |
---|---|---|
date | text | Primary Key |
index_open | real | |
index_close | real |
In order to check if constituents_history.pkl contains stock tickers, we use company_ticker.json file available at https://www.sec.gov/files/company_tickers.json. This file contains all companies traded on the US stock exchange along with their company names and tickers.
Delisted Tickers: RHT, TIF, CXO, GWR, PE, ZAYO, LOXO, ELLI, USG, TCF, CLGX, CMD, CTB
- Since we do not have visiblity into index constituents for a given day, all valid tickers present in original file are assumed to make up the index (Price weighted Total stock index from the universe of all valid stocks)