Exploratory-Data-Analysis-for-Financial-Data

Description

In this jupyter notebook, I have performed Exploratory Data Analysis on Financial data.

I have used stocks data of a company (named any XYZ) which contains the opening, closing, high and low stock prices per day.

Also performed data filtering and built some visualizations from the dataset.

Importing the required libraries
- pandas : pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
- numpy : NumPy is a Python library that provides a simple yet powerful data structure: the n-dimensional array.
- matplotlib : Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
- seaborn : Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
Knowing the dataset

Analyzing historical price data is an important way to try to make future predictions. We will be using a dataset of stock data for the company XYZ. This data includes opening, closing, high, low prices and stocks traded per day.

Importing and Loading the dataset
- The dataset is available in the repository.(Here)
- Dataset: HistoricalQuotes.csv
Exploratory Data Analysis
- df.shape
- df.info()
- df.head()
- df.isna().sum()
- df.drop()
  - df.drop(df.index[0], inplace=True) : Dropping 1st row of the dataset
- Changing the dtype of the open,close,high and low columns to float
  - df.open = df.open.astype(float)
- Changing the dtype of the volume column to int
  - df.volume = pd.to_numeric(df['volume']).astype(int)
- df.describe() - Summary statistics for all numeric columns (default)
  - df.describe(include='int') : Summary statistics only for columns whose type is "int"
- Percentiles
  - df.describe(percentiles=[.3, .5, .9]) : Summary statistics for all numeric columns with percentiles .3, .5 and .9
Filtering Data
- Creating different dataframes using masks.
  - For example: Create a mask for all of the rows whose daily high is greater than $600.
  - high_mask = df.high > 600; df.loc[high_mask]
Filtering Data with dates
- Selecting date from date ranges
  - start_date = datetime(2019,8,2); end_date = datetime(2019,7,29)
- Importing datetime package.
  - from datetime import datetime
- Creating mask of historical dates.
  - mask = (end_date <= df.index) & (df.index <= start_date); df2 = df[mask]; df2
Data Visualizations
- Line Plot showing Daily High Prices
- Histogram of High Prices
- Line plot of Volume over time
- Daily Volume Frequency distribution
- Filtering & Plotting data
  - Filtering data of year 2016 from the dataset and then plotting it on a line plot over time
  - df.loc['2016'].plot(y='low'); plt.show()

Ravjot03/Exploratory-Data-Analysis-for-Financial-Data

Exploratory-Data-Analysis-for-Financial-Data

Description

Contents