Project can be found at the following url vector-project.herokuapp.com
Project was wrote on Python 3.7 and requires a handful of python packages. Install them using:
pip install -r requirements.txt
Project also requires connection to the internet in order to pull the data from online sources. A Mapbox token is required for the application to run locally.
Once the required packages are installed, setup is minimal. Clone this repository to get all the required files. Take the mapbox token you aquired and paste it into the mapbox_key.txt file. Run app.py to start the local server.
A local server should run and you should now be able to nagivate to the application locally.
-
Data is gathered from the following sources:
- Case Data: https://storage.googleapis.com/ve-public/covid_case_counts2.csv
- Sequence Data: https://storage.googleapis.com/ve-public/covid_new_sequences.csv
- Country Data: https://storage.googleapis.com/ve-public/country_iso.csv
- continent Data: https://pkgstore.datahub.io/JohnSnowLabs/country-and-continent-codes-list/country-and-continent-codes-list-csv_csv/data/b7876b7f496677669644f3d1069d3121/country-and-continent-codes-list-csv_csv.csv
Data was transformed and combined utilizing pandas in three parts.
- Case data is called and merged with country and continent, then melted with variable as the number of cases
- Sequence data is called, date is converted from a string to pd.DateTime, and column names are changed for easier merging.
- Case data and sequence is merged using a left join on Country and Date columns. new_sequences and Cases columns had null data filled by zeros, while new_sequences was also grouped and a cumulative sum calculated for each country on each date. This alowed for all further tests and calculations to be done on the data.
-
Data Dictionary can be found here.
-
Dashboard was wrote utilizing Dash Plotly which is a Python framework for building web apps. Dash was chosen due to it's ease of use and the ability to quickly prototype usable interactive dashboards.
-
Visualization was done utilizing a combination of Plotly Express and Plotly's baceline Graph Objects.
Refreshing the page should restart the visualizations and reveret settings back to baseline.
Project was developed with the COVID CG global sequencing coverage map as inspiration. Special thanks goes out to their team.
- Add Feature: Clicking on the selected country (deslecting it) will enable orignal graph
- Improvement: Overall Documentation improvement
- Improvement: Further clean and refine Data
- Clean Case Sequence Line data to not include countries with 0 sequences at time of latest data
- Improvement MAJOR: Chart feedback is slow due to expensive data cleaning/transforming happening on each callback