Portfolio Analysis

Introduction

This project is a data analytics project for a 401k portfolio.

The data in example.csv has been sanitized.

Apache Spark via Pyspark and pandas, Python, jupyter notebook,matplotlib and delta tables

The jupyter notebook uses pyspark to read the example dataset.
pyspark has SQL capability to perform some data cleaning.
- Few columns had $ which could not be processes so $ were removed.
- Few columns had - and ( ) to indicate a negative amount or quality
- Dates in dataset were assumed date on initial import of csv
Use SQL:
- identify securities invested
- identify total amount allocated
- identify personal contributions (category - Employee pre-tax contributions)
- identify employer contributions (category - Employer matching 401k contributions (fully vested))
- identify portfolio fees
identify allocated contributions per category
- use pandas to generate visualization of contribution percentages
- calculated total contribution per category
calculate total quantity (shares) and total amount per security
Visualize security progress
Use API to gather current market price via Twelve Data API