In this jupyter notebook, I have performed Exploratory Data Analysis on Financial data.
I have used stocks data of a company (named any XYZ) which contains the opening, closing, high and low stock prices per day.
Also performed data filtering and built some visualizations from the dataset.
-
Importing the required libraries
pandas
: pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.numpy
: NumPy is a Python library that provides a simple yet powerful data structure: the n-dimensional array.matplotlib
: Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.seaborn
: Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
-
Knowing the dataset
Analyzing historical price data is an important way to try to make future predictions. We will be using a dataset of stock data for the company XYZ. This data includes opening, closing, high, low prices and stocks traded per day.
-
Importing and Loading the dataset
- The dataset is available in the repository.(Here)
- Dataset:
HistoricalQuotes.csv
-
Exploratory Data Analysis
df.shape
df.info()
df.head()
df.isna().sum()
df.drop()
df.drop(df.index[0], inplace=True)
: Dropping 1st row of the dataset
- Changing the dtype of the
open
,close
,high
andlow
columns tofloat
df.open = df.open.astype(float)
- Changing the dtype of the
volume
column toint
df.volume = pd.to_numeric(df['volume']).astype(int)
- df.describe() - Summary statistics for all numeric columns (default)
df.describe(include='int')
: Summary statistics only for columns whose type is "int"
- Percentiles
df.describe(percentiles=[.3, .5, .9])
: Summary statistics for all numeric columns with percentiles .3, .5 and .9
-
Filtering Data
- Creating different dataframes using masks.
- For example: Create a mask for all of the rows whose daily high is greater than $600.
high_mask = df.high > 600; df.loc[high_mask]
- Creating different dataframes using masks.
-
Filtering Data with dates
- Selecting date from date ranges
start_date = datetime(2019,8,2); end_date = datetime(2019,7,29)
- Importing
datetime
package.from datetime import datetime
- Creating mask of historical dates.
mask = (end_date <= df.index) & (df.index <= start_date); df2 = df[mask]; df2
- Selecting date from date ranges
-
Data Visualizations
- Line Plot showing Daily High Prices
- Histogram of High Prices
- Line plot of Volume over time
- Daily Volume Frequency distribution
- Filtering & Plotting data
- Filtering data of year 2016 from the dataset and then plotting it on a line plot over time
df.loc['2016'].plot(y='low'); plt.show()